Amazon DynamoDB vs Amazon EMR

Overview

Amazon EMR

Stacks543

Followers682

Votes54

Amazon DynamoDB

Stacks4.0K

Followers3.2K

Votes195

Amazon DynamoDB vs Amazon EMR: What are the differences?

Introduction Amazon DynamoDB and Amazon EMR are both data storage and processing services provided by Amazon Web Services (AWS). While they have some similarities, there are several key differences between the two that make them suitable for different use cases.

Data Structure: DynamoDB is a NoSQL database service, while EMR is a distributed big data processing framework. DynamoDB stores data in a structured key-value format, allowing for fast and predictable performance. On the other hand, EMR is designed to process large amounts of unstructured and semi-structured data using tools like Apache Hadoop, Spark, and Hive.
Scalability: DynamoDB is a fully managed service that automatically scales to handle the requested throughput capacity. It can handle millions of requests per second and provides seamless scalability without any manual intervention. EMR, on the other hand, allows you to provision a cluster with a specific number of compute instances to process your data. Scaling in EMR requires manual adjustments to the cluster size and configurations.
Data Availability: DynamoDB offers built-in multi-region replication, allowing you to replicate your data across multiple AWS regions for enhanced availability and disaster recovery. With EMR, you need to manually configure and manage data replication if you require data availability across regions.
Data Processing Options: DynamoDB provides limited data processing capabilities with features like filtering, projection, and basic aggregations. It is best suited for simple and low-latency data access patterns. EMR, on the other hand, offers a wide range of data processing options through the various big data processing frameworks it supports. This allows you to perform complex transformations, machine learning tasks, and analytics on large datasets.
Cost Model: DynamoDB charges you based on the provisioned throughput capacity and the amount of data stored. The pricing is predictable and can be optimized based on your specific workload requirements. EMR, on the other hand, charges you based on the EC2 instances used in the cluster, storage costs, and other associated services. The cost of EMR can vary depending on the size and complexity of your data processing jobs.
Use Case Fit: DynamoDB is suitable for applications that require simple and low-latency data access with predictable performance, such as real-time applications, gaming leaderboards, and session stores. EMR, on the other hand, is well-suited for big data processing and analytics use cases, where you need to process large volumes of data with various processing frameworks and perform complex data transformations.

In summary, Amazon DynamoDB is a NoSQL database service that provides fast and scalable key-value data storage, while Amazon EMR is a distributed big data processing framework that allows for processing and analysis of large datasets using various tools and frameworks. The choice between DynamoDB and EMR depends on your specific data storage and processing needs.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Amazon EMR, Amazon DynamoDB

Doru

Solution Architect

Jun 9, 2019

Reviewon

Amazon DynamoDB

I use Amazon DynamoDB because it integrates seamlessly with other AWS SaaS solutions and if cost is the primary concern early on, then this will be a better choice when compared to AWS RDS or any other solution that requires the creation of a HA cluster of IaaS components that will cost money just for being there, the costs not being influenced primarily by usage.

1.38k views1.38k

Comments

Detailed Comparison

Amazon EMR	Amazon DynamoDB
It is used in a variety of applications, including log analysis, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.	With it , you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use.
Elastic- Amazon EMR enables you to quickly and easily provision as much capacity as you need and add or remove capacity at any time. Deploy multiple clusters or resize a running cluster;Low Cost- Amazon EMR is designed to reduce the cost of processing large amounts of data. Some of the features that make it low cost include low hourly pricing, Amazon EC2 Spot integration, Amazon EC2 Reserved Instance integration, elasticity, and Amazon S3 integration.;Flexible Data Stores- With Amazon EMR, you can leverage multiple data stores, including Amazon S3, the Hadoop Distributed File System (HDFS), and Amazon DynamoDB.;Hadoop Tools- EMR supports powerful and proven Hadoop tools such as Hive, Pig, and HBase.	Automated Storage Scaling – There is no limit to the amount of data you can store in a DynamoDB table, and the service automatically allocates more storage, as you store more data using the DynamoDB write APIs;Provisioned Throughput – When creating a table, simply specify how much request capacity you require. DynamoDB allocates dedicated resources to your table to meet your performance requirements, and automatically partitions data over a sufficient number of servers to meet your request capacity;Fully Distributed, Shared Nothing Architecture
Statistics
Stacks 543	Stacks 4.0K
Followers 682	Followers 3.2K
Votes 54	Votes 195
Pros & Cons
Pros 15 On demand processing power 12 Don't need to maintain Hadoop Cluster yourself 7 Hadoop Tools 6 Elastic 4 Backed by Amazon	Pros 62 Predictable performance and cost 56 Scalable 35 Native JSON Support 21 AWS Free Tier 7 Fast Cons 4 Only sequential access for paginate data 1 Scaling 1 Document Limit Size
Integrations
No integrations available	Amazon RDS for PostgreSQL PostgreSQL MySQL SQLite Azure Database for MySQL

What are some alternatives to Amazon EMR, Amazon DynamoDB?

Google BigQuery

Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python.

Azure Cosmos DB

Azure DocumentDB is a fully managed NoSQL database service built for fast and predictable performance, high availability, elastic scaling, global distribution, and ease of development.

Cloud Firestore

Cloud Firestore is a NoSQL document database that lets you easily store, sync, and query data for your mobile and web apps - at global scale.

Amazon Redshift

It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.

Qubole

Qubole is a cloud based service that makes big data easy for analysts and data engineers.

Cloudant

Cloudant’s distributed database as a service (DBaaS) allows developers of fast-growing web and mobile apps to focus on building and improving their products, instead of worrying about scaling and managing databases on their own.

Altiscale

we run Apache Hadoop for you. We not only deploy Hadoop, we monitor, manage, fix, and update it for you. Then we take it a step further: We monitor your jobs, notify you when something’s wrong with them, and can help with tuning.

Snowflake

Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.

Google Cloud Bigtable

Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years—it's the database driving major applications such as Google Analytics and Gmail.

Google Cloud Datastore

Use a managed, NoSQL, schemaless database for storing non-relational data. Cloud Datastore automatically scales as you need it and supports transactions as well as robust, SQL-like queries.

Related Comparisons

Amazon DynamoDB vs Amazon EMR: What are the differences?

Data Structure: DynamoDB is a NoSQL database service, while EMR is a distributed big data processing framework. DynamoDB stores data in a structured key-value format, allowing for fast and predictable performance. On the other hand, EMR is designed to process large amounts of unstructured and semi-structured data using tools like Apache Hadoop, Spark, and Hive.
Scalability: DynamoDB is a fully managed service that automatically scales to handle the requested throughput capacity. It can handle millions of requests per second and provides seamless scalability without any manual intervention. EMR, on the other hand, allows you to provision a cluster with a specific number of compute instances to process your data. Scaling in EMR requires manual adjustments to the cluster size and configurations.
Data Availability: DynamoDB offers built-in multi-region replication, allowing you to replicate your data across multiple AWS regions for enhanced availability and disaster recovery. With EMR, you need to manually configure and manage data replication if you require data availability across regions.
Data Processing Options: DynamoDB provides limited data processing capabilities with features like filtering, projection, and basic aggregations. It is best suited for simple and low-latency data access patterns. EMR, on the other hand, offers a wide range of data processing options through the various big data processing frameworks it supports. This allows you to perform complex transformations, machine learning tasks, and analytics on large datasets.
Cost Model: DynamoDB charges you based on the provisioned throughput capacity and the amount of data stored. The pricing is predictable and can be optimized based on your specific workload requirements. EMR, on the other hand, charges you based on the EC2 instances used in the cluster, storage costs, and other associated services. The cost of EMR can vary depending on the size and complexity of your data processing jobs.
Use Case Fit: DynamoDB is suitable for applications that require simple and low-latency data access with predictable performance, such as real-time applications, gaming leaderboards, and session stores. EMR, on the other hand, is well-suited for big data processing and analytics use cases, where you need to process large volumes of data with various processing frameworks and perform complex data transformations.

Amazon DynamoDB vs Amazon EMR

Overview