Amazon S3 vs Kafka

Overview

Amazon S3

Stacks55.1K

Followers40.2K

Votes2.0K

Kafka

Stacks24.2K

Followers22.3K

Votes607

GitHub Stars31.2K

Forks14.8K

Amazon S3 vs Kafka: What are the differences?

Amazon S3: Amazon Simple Storage Service (S3) is a scalable storage service offered by Amazon Web Services (AWS). It provides developers with the ability to store and retrieve any amount of data at any time, from anywhere on the web. S3 is designed to provide durability, availability, and scalability for data storage needs.
Kafka: Apache Kafka is a distributed streaming platform that is used for building real-time data pipelines and streaming applications. It is designed to handle high volumes of data, provide fault-tolerance, and support real-time processing of data streams. Kafka is widely used for building event-driven architectures and processing real-time data feeds.
Storage vs Messaging: The key difference between S3 and Kafka lies in their primary use cases. S3 is primarily used for storage purposes, offering a scalable and durable solution for storing and retrieving objects like files, images, and videos. On the other hand, Kafka is designed for messaging and real-time data streaming, providing a publish-subscribe model to handle streams of data.
Data Persistence: While S3 offers long-term data persistence and storage, Kafka is more focused on processing real-time data streams. S3 ensures durability and availability of stored objects, allowing them to be accessed at any time. In contrast, Kafka focuses on the real-time processing of data streams, allowing for high-throughput and fault-tolerant streaming applications.
Retrieval and Querying: In S3, objects are stored and retrieved using APIs with simple key-value semantics. It provides a simple and efficient way to store and retrieve data, but querying the data requires additional processing and tools. In Kafka, data is stored in topics and can be retrieved by consumers subscribing to specific topics. Kafka provides powerful features for real-time data processing and querying, making it suitable for streaming applications.
Event-driven Architecture: Kafka is commonly used in event-driven architectures, where it acts as a messaging system that enables the flow of data between different components and services. S3, on the other hand, is not designed for real-time event-driven architectures but rather for reliable and scalable storage of objects.

In Summary, Amazon S3 is a scalable storage service primarily used for storing and retrieving objects, while Kafka is a distributed streaming platform focused on real-time data processing and messaging. S3 provides long-term data persistence and retrieval with simple key-value semantics, while Kafka enables high-throughput real-time streaming and messaging in event-driven architectures.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Amazon S3, Kafka

viradiya

Apr 12, 2020

Needs adviceon

AngularJS

ASP.NET Core

MSSQL

We are going to develop a microservices-based application. It consists of AngularJS, ASP.NET Core, and MSSQL.

We have 3 types of microservices. Emailservice, Filemanagementservice, Filevalidationservice

I am a beginner in microservices. But I have read about RabbitMQ, but come to know that there are Redis and Kafka also in the market. So, I want to know which is best.

934k views934k

Comments

Ishfaq

Feb 28, 2020

Needs advice

Our backend application is sending some external messages to a third party application at the end of each backend (CRUD) API call (from UI) and these external messages take too much extra time (message building, processing, then sent to the third party and log success/failure), UI application has no concern to these extra third party messages.

So currently we are sending these third party messages by creating a new child thread at end of each REST API call so UI application doesn't wait for these extra third party API calls.

I want to integrate Apache Kafka for these extra third party API calls, so I can also retry on failover third party API calls in a queue(currently third party messages are sending from multiple threads at the same time which uses too much processing and resources) and logging, etc.

Question 1: Is this a use case of a message broker?

Question 2: If it is then Kafka vs RabitMQ which is the better?

804k views804k

Comments

Mohammad

Aug 30, 2020

Needs adviceon

Backblaze B2 Cloud Storage

PHP

Laravel

Hello! I have a mobile app with nearly 100k MAU, and I want to add a cloud file storage service to my app.

My app will allow users to store their image, video, and audio files and retrieve them to their device when necessary.

I have already decided to use PHP & Laravel as my backend, and I use Contabo VPS. Now, I need an object storage service for my app, and my options are:

Amazon S3 : It sounds to me like the best option but the most expensive. Closest to my users (MENA Region) for other services, I will have to go to Europe. Not sure how important this is?
DigitalOcean Spaces : Seems like my best option for price/service, but I am still not sure
Wasabi: the best price (6 USD/MONTH/TB) and free bandwidth, but I am not sure if it fits my needs as I want to allow my users to preview audio and video files. They don't recommend their service for streaming videos.
Backblaze B2 Cloud Storage: Good price but not sure about them.
There is also the self-hosted s3 compatible option, but I am not sure about that.

Any thoughts will be helpful. Also, if you think I should post in a different sub, please tell me.

180k views180k

Comments

Detailed Comparison

Amazon S3	Kafka
Amazon Simple Storage Service provides a fully redundant data storage infrastructure for storing and retrieving any amount of data, at any time, from anywhere on the web	Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. The number of objects you can store is unlimited.;Each object is stored in a bucket and retrieved via a unique, developer-assigned key.;A bucket can be stored in one of several Regions. You can choose a Region to optimize for latency, minimize costs, or address regulatory requirements. Amazon S3 is currently available in the US Standard, US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), South America (Sao Paulo), and GovCloud (US) Regions. The US Standard Region automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps.;Objects stored in a Region never leave the Region unless you transfer them out. For example, objects stored in the EU (Ireland) Region never leave the EU.;Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.;Options for secure data upload/download and encryption of data at rest are provided for additional data protection.;Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.;Built to be flexible so that protocol or functional layers can easily be added. The default download protocol is HTTP. A BitTorrent protocol interface is provided to lower costs for high-scale distribution.;Provides functionality to simplify manageability of data through its lifetime. Includes options for segregating data by buckets, monitoring and controlling spend, and automatically archiving data to even lower cost storage options. These options can be easily administered from the Amazon S3 Management Console.;Reliability backed with the Amazon S3 Service Level Agreement.	Written at LinkedIn in Scala;Used by LinkedIn to offload processing of all page and other views;Defaults to using persistence, uses OS disk cache for hot data (has higher throughput then any of the above having persistence enabled);Supports both on-line as off-line processing
Statistics
GitHub Stars -	GitHub Stars 31.2K
GitHub Forks -	GitHub Forks 14.8K
Stacks 55.1K	Stacks 24.2K
Followers 40.2K	Followers 22.3K
Votes 2.0K	Votes 607
Pros & Cons
Pros 590 Reliable 492 Scalable 456 Cheap 329 Simple & easy 83 Many sdks Cons 7 Permissions take some time to get right 6 Takes time/work to organize buckets & folders properly 6 Requires a credit card 3 Complex to set up	Pros 126 High-throughput 119 Distributed 92 Scalable 86 High-Performance 66 Durable Cons 32 Non-Java clients are second-class citizens 29 Needs Zookeeper 9 Operational difficulties 5 Terrible Packaging

What are some alternatives to Amazon S3, Kafka?

RabbitMQ

RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.

Celery

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

Amazon SQS

Transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available. With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use.

NSQ

NSQ is a realtime distributed messaging platform designed to operate at scale, handling billions of messages per day. It promotes distributed and decentralized topologies without single points of failure, enabling fault tolerance and high availability coupled with a reliable message delivery guarantee. See features & guarantees.

Amazon EBS

Amazon EBS volumes are network-attached, and persist independently from the life of an instance. Amazon EBS provides highly available, highly reliable, predictable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage.

ActiveMQ

Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License.

Google Cloud Storage

Google Cloud Storage allows world-wide storing and retrieval of any amount of data and at any time. It provides a simple programming interface which enables developers to take advantage of Google's own reliable and fast networking infrastructure to perform data operations in a secure and cost effective manner. If expansion needs arise, developers can benefit from the scalability provided by Google's infrastructure.

ZeroMQ

The 0MQ lightweight messaging kernel is a library which extends the standard socket interfaces with features traditionally provided by specialised messaging middleware products. 0MQ sockets provide an abstraction of asynchronous message queues, multiple messaging patterns, message filtering (subscriptions), seamless access to multiple transport protocols and more.

Apache NiFi

An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Azure Storage

Azure Storage provides the flexibility to store and retrieve large amounts of unstructured data, such as documents and media files with Azure Blobs; structured nosql based data with Azure Tables; reliable messages with Azure Queues, and use SMB based Azure Files for migrating on-premises applications to the cloud.

Related Comparisons

Amazon S3 vs Kafka: What are the differences?

Amazon S3: Amazon Simple Storage Service (S3) is a scalable storage service offered by Amazon Web Services (AWS). It provides developers with the ability to store and retrieve any amount of data at any time, from anywhere on the web. S3 is designed to provide durability, availability, and scalability for data storage needs.
Kafka: Apache Kafka is a distributed streaming platform that is used for building real-time data pipelines and streaming applications. It is designed to handle high volumes of data, provide fault-tolerance, and support real-time processing of data streams. Kafka is widely used for building event-driven architectures and processing real-time data feeds.
Storage vs Messaging: The key difference between S3 and Kafka lies in their primary use cases. S3 is primarily used for storage purposes, offering a scalable and durable solution for storing and retrieving objects like files, images, and videos. On the other hand, Kafka is designed for messaging and real-time data streaming, providing a publish-subscribe model to handle streams of data.
Data Persistence: While S3 offers long-term data persistence and storage, Kafka is more focused on processing real-time data streams. S3 ensures durability and availability of stored objects, allowing them to be accessed at any time. In contrast, Kafka focuses on the real-time processing of data streams, allowing for high-throughput and fault-tolerant streaming applications.
Retrieval and Querying: In S3, objects are stored and retrieved using APIs with simple key-value semantics. It provides a simple and efficient way to store and retrieve data, but querying the data requires additional processing and tools. In Kafka, data is stored in topics and can be retrieved by consumers subscribing to specific topics. Kafka provides powerful features for real-time data processing and querying, making it suitable for streaming applications.
Event-driven Architecture: Kafka is commonly used in event-driven architectures, where it acts as a messaging system that enables the flow of data between different components and services. S3, on the other hand, is not designed for real-time event-driven architectures but rather for reliable and scalable storage of objects.

Amazon S3 vs Kafka

Overview