Amazon DynamoDB vs Amazon Redshift: What are the differences?
Amazon DynamoDB: Fully managed NoSQL database service. All data items are stored on Solid State Drives (SSDs), and are replicated across 3 Availability Zones for high availability and durability. With DynamoDB, you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use; Amazon Redshift: Fast, fully managed, petabyte-scale data warehouse service. Redshift makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.
Amazon DynamoDB belongs to "NoSQL Database as a Service" category of the tech stack, while Amazon Redshift can be primarily classified under "Big Data as a Service".
Some of the features offered by Amazon DynamoDB are:
- Automated Storage Scaling – There is no limit to the amount of data you can store in a DynamoDB table, and the service automatically allocates more storage, as you store more data using the DynamoDB write APIs.
- Provisioned Throughput – When creating a table, simply specify how much request capacity you require. DynamoDB allocates dedicated resources to your table to meet your performance requirements, and automatically partitions data over a sufficient number of servers to meet your request capacity. If your throughput requirements change, simply update your table's request capacity using the AWS Management Console or the Amazon DynamoDB APIs. You are still able to achieve your prior throughput levels while scaling is underway.
- Fully Distributed, Shared Nothing Architecture – Amazon DynamoDB scales horizontally and can seamlessly scale a single table over hundreds of servers.
On the other hand, Amazon Redshift provides the following key features:
- Optimized for Data Warehousing- It uses columnar storage, data compression, and zone maps to reduce the amount of IO needed to perform queries. Redshift has a massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations to take advantage of all available resources.
- Scalable- With a few clicks of the AWS Management Console or a simple API call, you can easily scale the number of nodes in your data warehouse up or down as your performance or capacity needs change.
- No Up-Front Costs- You pay only for the resources you provision. You can choose On-Demand pricing with no up-front costs or long-term commitments, or obtain significantly discounted rates with Reserved Instance pricing.
"Predictable performance and cost" is the top reason why over 53 developers like Amazon DynamoDB, while over 27 developers mention "Data Warehousing" as the leading cause for choosing Amazon Redshift.
Netflix, Medium, and Lyft are some of the popular companies that use Amazon DynamoDB, whereas Amazon Redshift is used by Lyft, Coursera, and 9GAG. Amazon DynamoDB has a broader approval, being mentioned in 444 company stacks & 187 developers stacks; compared to Amazon Redshift, which is listed in 270 company stacks and 68 developer stacks.
What is Amazon DynamoDB?
What is Amazon Redshift?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Amazon Redshift?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
Looker , Stitch , Amazon Redshift , dbt
We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.
For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.
Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.
I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.
For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.
Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.
Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.
Future improvements / technology decisions included:
Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic
As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.
One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.
I use Amazon DynamoDB because it integrates seamlessly with other AWS SaaS solutions and if cost is the primary concern early on, then this will be a better choice when compared to AWS RDS or any other solution that requires the creation of a HA cluster of IaaS components that will cost money just for being there, the costs not being influenced primarily by usage.
For most of the stuff we use MySQL. We just use Amazon RDS. But for some stuff we use Amazon DynamoDB. We love DynamoDB. It's amazing. We store usage data in there, for example. I think we have close to seven or eight hundred million records in there and it's scaled like you don't even notice it. You never notice any performance degradation whatsoever. It's insane, and the last time I checked we were paying $150 bucks for that.
zerotoherojs.com ’s userbase, and course details are stored in DynamoDB tables.
The good thing about AWS DynamoDB is: For the amount of traffic that I have, it is free. It is highly-scalable, it is managed by Amazon, and it is pretty fast.
It is, again, one less thing to worry about (when compared to managing your own MongoDB elsewhere).
We store customer metadata in DynamoDB. We decided to use Amazon DynamoDB because it was a fully managed, highly available solution. We didn't want to operate our own SQL server and we wanted to ensure that we built CloudRepo on high availability components so that we could pass that benefit back to our customers.
Aggressive archiving of historical data to keep the production database as small as possible. Using our in-house soon-to-be-open-sourced ETL library, SharpShifter.
몇몇 로그는 현재 AWS DynamoDB 에 기록되고 있습니다. 개선을 통해 mongodb 로 옮길 계획을 하고 있습니다. 아주 간단한 데이터를 쌓는 용도로는 나쁘지 않습니다. 다만, 쿼리가 아주 제한적입니다. 사용하기 전에 반드시 DynamoDB 의 스펙을 확인할 필요가 있습니다.
To store device health records as it allows super fast writes and range queries.