Amazon DynamoDB vs Kafka: What are the differences?
What is Amazon DynamoDB? Fully managed NoSQL database service. All data items are stored on Solid State Drives (SSDs), and are replicated across 3 Availability Zones for high availability and durability. With DynamoDB, you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use.
What is Kafka? Distributed, fault tolerant, high throughput pub-sub messaging system. Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
Amazon DynamoDB and Kafka are primarily classified as "NoSQL Database as a Service" and "Message Queue" tools respectively.
Some of the features offered by Amazon DynamoDB are:
- Automated Storage Scaling – There is no limit to the amount of data you can store in a DynamoDB table, and the service automatically allocates more storage, as you store more data using the DynamoDB write APIs.
- Provisioned Throughput – When creating a table, simply specify how much request capacity you require. DynamoDB allocates dedicated resources to your table to meet your performance requirements, and automatically partitions data over a sufficient number of servers to meet your request capacity. If your throughput requirements change, simply update your table's request capacity using the AWS Management Console or the Amazon DynamoDB APIs. You are still able to achieve your prior throughput levels while scaling is underway.
- Fully Distributed, Shared Nothing Architecture – Amazon DynamoDB scales horizontally and can seamlessly scale a single table over hundreds of servers.
On the other hand, Kafka provides the following key features:
- Written at LinkedIn in Scala
- Used by LinkedIn to offload processing of all page and other views
- Defaults to using persistence, uses OS disk cache for hot data (has higher throughput then any of the above having persistence enabled)
"Predictable performance and cost" is the primary reason why developers consider Amazon DynamoDB over the competitors, whereas "High-throughput" was stated as the key factor in picking Kafka.
Kafka is an open source tool with 12.7K GitHub stars and 6.81K GitHub forks. Here's a link to Kafka's open source repository on GitHub.
According to the StackShare community, Kafka has a broader approval, being mentioned in 509 company stacks & 470 developers stacks; compared to Amazon DynamoDB, which is listed in 444 company stacks and 187 developer stacks.
What is Amazon DynamoDB?
What is Kafka?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :
Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:
(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )
I use Kafka because it has almost infinite scaleability in terms of processing events (could be scaled to process hundreds of thousands of events), great monitoring (all sorts of metrics are exposed via JMX).
Downsides of using Kafka are: - you have to deal with Zookeeper - you have to implement advanced routing yourself (compared to RabbitMQ it has no advanced routing)
The question for which Message Queue to use mentioned "availability, distributed, scalability, and monitoring". I don't think that this excludes many options already. I does not sound like you would take advantage of Kafka's strengths (replayability, based on an even sourcing architecture). You could pick one of the AMQP options.
I would recommend the RabbitMQ message broker, which not only implements the AMQP standard 0.9.1 (it can support 1.x or other protocols as well) but has also several very useful extensions built in. It ticks the boxes you mentioned and on top you will get a very flexible system, that allows you to build the architecture, pick the options and trade-offs that suite your case best.
For more information about RabbitMQ, please have a look at the linked markdown I assembled. The second half explains many configuration options. It also contains links to managed hosting and to libraries (though it is missing Python's - which should be Puka, I assume).
I used Kafka originally because it was mandated as part of the top-level IT requirements at a Fortune 500 client. What I found was that it was orders of magnitude more complex ...and powerful than my daily Beanstalkd , and far more flexible, resilient, and manageable than RabbitMQ.
So for any case where utmost flexibility and resilience are part of the deal, I would use Kafka again. But due to the complexities involved, for any time where this level of scalability is not required, I would probably just use Beanstalkd for its simplicity.
I tend to find RabbitMQ to be in an uncomfortable middle place between these two extremities.
I use Amazon DynamoDB because it integrates seamlessly with other AWS SaaS solutions and if cost is the primary concern early on, then this will be a better choice when compared to AWS RDS or any other solution that requires the creation of a HA cluster of IaaS components that will cost money just for being there, the costs not being influenced primarily by usage.
Front-end messages are logged to Kafka by our API and application servers. We have batch processing (on the middle-left) and real-time processing (on the middle-right) pipelines to process the experiment data. For batch processing, after daily raw log get to s3, we start our nightly experiment workflow to figure out experiment users groups and experiment metrics. We use our in-house workflow management system Pinball to manage the dependencies of all these MapReduce jobs.
For most of the stuff we use MySQL. We just use Amazon RDS. But for some stuff we use Amazon DynamoDB. We love DynamoDB. It's amazing. We store usage data in there, for example. I think we have close to seven or eight hundred million records in there and it's scaled like you don't even notice it. You never notice any performance degradation whatsoever. It's insane, and the last time I checked we were paying $150 bucks for that.
zerotoherojs.com ’s userbase, and course details are stored in DynamoDB tables.
The good thing about AWS DynamoDB is: For the amount of traffic that I have, it is free. It is highly-scalable, it is managed by Amazon, and it is pretty fast.
It is, again, one less thing to worry about (when compared to managing your own MongoDB elsewhere).
We store customer metadata in DynamoDB. We decided to use Amazon DynamoDB because it was a fully managed, highly available solution. We didn't want to operate our own SQL server and we wanted to ensure that we built CloudRepo on high availability components so that we could pass that benefit back to our customers.
몇몇 로그는 현재 AWS DynamoDB 에 기록되고 있습니다. 개선을 통해 mongodb 로 옮길 계획을 하고 있습니다. 아주 간단한 데이터를 쌓는 용도로는 나쁘지 않습니다. 다만, 쿼리가 아주 제한적입니다. 사용하기 전에 반드시 DynamoDB 의 스펙을 확인할 필요가 있습니다.
Building out real-time streaming server to present data insights to Coolfront Mobile customers and internal sales and marketing teams.
To store device health records as it allows super fast writes and range queries.