Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Amazon EMR

546
682
+ 1
54
Neo4j

1.2K
1.4K
+ 1
351
Add tool

Amazon EMR vs Neo4j: What are the differences?

Developers describe Amazon EMR as "Distribute your data and processing across a Amazon EC2 instances using Hadoop". Amazon EMR is used in a variety of applications, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. Customers launch millions of Amazon EMR clusters every year. On the other hand, Neo4j is detailed as "The world’s leading Graph Database". Neo4j stores data in nodes connected by directed, typed relationships with properties on both, also known as a Property Graph. It is a high performance graph store with all the features expected of a mature and robust database, like a friendly query language and ACID transactions.

Amazon EMR can be classified as a tool in the "Big Data as a Service" category, while Neo4j is grouped under "Graph Databases".

Some of the features offered by Amazon EMR are:

  • Elastic- Amazon EMR enables you to quickly and easily provision as much capacity as you need and add or remove capacity at any time. Deploy multiple clusters or resize a running cluster
  • Low Cost- Amazon EMR is designed to reduce the cost of processing large amounts of data. Some of the features that make it low cost include low hourly pricing, Amazon EC2 Spot integration, Amazon EC2 Reserved Instance integration, elasticity, and Amazon S3 integration.
  • Flexible Data Stores- With Amazon EMR, you can leverage multiple data stores, including Amazon S3, the Hadoop Distributed File System (HDFS), and Amazon DynamoDB.

On the other hand, Neo4j provides the following key features:

  • intuitive, using a graph model for data representation
  • reliable, with full ACID transactions
  • durable and fast, using a custom disk-based, native storage engine

"On demand processing power" is the primary reason why developers consider Amazon EMR over the competitors, whereas "Cypher – graph query language" was stated as the key factor in picking Neo4j.

Neo4j is an open source tool with 6.61K GitHub stars and 1.63K GitHub forks. Here's a link to Neo4j's open source repository on GitHub.

According to the StackShare community, Neo4j has a broader approval, being mentioned in 114 company stacks & 47 developers stacks; compared to Amazon EMR, which is listed in 95 company stacks and 18 developer stacks.

Advice on Amazon EMR and Neo4j
Jaime Ramos
Needs advice
on
ArangoDBArangoDBDgraphDgraph
and
Neo4jNeo4j

Hi, I want to create a social network for students, and I was wondering which of these three Oriented Graph DB's would you recommend. I plan to implement machine learning algorithms such as k-means and others to give recommendations and some basic data analyses; also, everything is going to be hosted in the cloud, so I expect the DB to be hosted there. I want the queries to be as fast as possible, and I like good tools to monitor my data. I would appreciate any recommendations or thoughts.

Context:

I released the MVP 6 months ago and got almost 600 users just from my university in Colombia, But now I want to expand it all over my country. I am expecting more or less 20000 users.

See more
Replies (3)
Recommends
on
ArangoDBArangoDB

I have not used the others but I agree, ArangoDB should meet your needs. If you have worked with RDBMS and SQL before Arango will be a easy transition. AQL is simple yet powerful and deployment can be as small or large as you need. I love the fact that for my local development I can run it as docker container as part of my project and for production I can have multiple machines in a cluster. The project is also under active development and with the latest round of funding I feel comfortable that it will be around a while.

See more
David López Felguera
Full Stack Developer at NPAW · | 5 upvotes · 52.7K views
Recommends
on
ArangoDBArangoDB

Hi Jaime. I've worked with Neo4j and ArangoDB for a few years and for me, I prefer to use ArangoDB because its query sintax (AQL) is easier. I've built a network topology with both databases and now ArangoDB is the databases for that network topology. Also, ArangoDB has ArangoML that maybe can help you with your recommendation algorithims.

See more
Recommends
on
ArangoDBArangoDB

Hi Jaime, I work with Arango for about 3 years quite a lot. Before I do some investigation and choose ArangoDB against Neo4j due to multi-type DB, speed, and also clustering (but we do not use it now). Now we have RMDB and Graph working together. As others said, AQL is quite easy, but u can use some of the drivers like Java Spring, that get you to another level.. If you prefer more copy-paste with little rework, perhaps Neo4j can do the job for you, because there is a bigger community around it.. But I have to solve some issues with the ArangoDB community and its also fast. So I will preffere ArangoDB... Btw, there is a super easy Foxx Microservice tool on Arango that can help you solve basic things faster than write down robust BackEnd.

See more
Manage your open source components, licenses, and vulnerabilities
Learn More
Pros of Amazon EMR
Pros of Neo4j
  • 15
    On demand processing power
  • 12
    Don't need to maintain Hadoop Cluster yourself
  • 7
    Hadoop Tools
  • 6
    Elastic
  • 4
    Backed by Amazon
  • 3
    Flexible
  • 3
    Economic - pay as you go, easy to use CLI and SDKs
  • 2
    Don't need a dedicated Ops group
  • 1
    Massive data handling
  • 1
    Great support
  • 69
    Cypher – graph query language
  • 61
    Great graphdb
  • 33
    Open source
  • 31
    Rest api
  • 27
    High-Performance Native API
  • 23
    ACID
  • 21
    Easy setup
  • 17
    Great support
  • 11
    Clustering
  • 9
    Hot Backups
  • 8
    Great Web Admin UI
  • 7
    Powerful, flexible data model
  • 7
    Mature
  • 6
    Embeddable
  • 5
    Easy to Use and Model
  • 4
    Highly-available
  • 4
    Best Graphdb
  • 2
    It's awesome, I wanted to try it
  • 2
    Great onboarding process
  • 2
    Great query language and built in data browser
  • 2
    Used by Crunchbase

Sign up to add or upvote prosMake informed product decisions

Cons of Amazon EMR
Cons of Neo4j
    Be the first to leave a con
    • 9
      Comparably slow
    • 4
      Can't store a vertex as JSON
    • 1
      Doesn't have a managed cloud service at low cost

    Sign up to add or upvote consMake informed product decisions

    2.7K
    3.4K
    263
    642
    22.9K
    - No public GitHub repository available -

    What is Amazon EMR?

    It is used in a variety of applications, including log analysis, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.

    What is Neo4j?

    Neo4j stores data in nodes connected by directed, typed relationships with properties on both, also known as a Property Graph. It is a high performance graph store with all the features expected of a mature and robust database, like a friendly query language and ACID transactions.

    Need advice about which tool to choose?Ask the StackShare community!

    What companies use Amazon EMR?
    What companies use Neo4j?
    Manage your open source components, licenses, and vulnerabilities
    Learn More

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Amazon EMR?
    What tools integrate with Neo4j?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    Blog Posts

    Aug 28 2019 at 3:10AM

    Segment

    PythonJavaAmazon S3+16
    7
    2686
    GitHubMySQLSlack+44
    109
    50838
    What are some alternatives to Amazon EMR and Neo4j?
    Amazon EC2
    It is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.
    Hadoop
    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
    Amazon DynamoDB
    With it , you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use.
    Amazon Redshift
    It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.
    Azure HDInsight
    It is a cloud-based service from Microsoft for big data analytics that helps organizations process large amounts of streaming or historical data.
    See all alternatives