Hadoop vs MongoDB

Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Hadoop
Hadoop

1.1K
918
+ 1
48
MongoDB
MongoDB

17.1K
13.5K
+ 1
3.8K
Add tool

Hadoop vs MongoDB: What are the differences?

What is Hadoop? Open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

What is MongoDB? The database for giant ideas. MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.

Hadoop and MongoDB belong to "Databases" category of the tech stack.

"Great ecosystem" is the primary reason why developers consider Hadoop over the competitors, whereas "Document-oriented storage" was stated as the key factor in picking MongoDB.

Hadoop and MongoDB are both open source tools. MongoDB with 16.3K GitHub stars and 4.1K forks on GitHub appears to be more popular than Hadoop with 9.26K GitHub stars and 5.78K GitHub forks.

Uber Technologies, Lyft, and Codecademy are some of the popular companies that use MongoDB, whereas Hadoop is used by Airbnb, Uber Technologies, and Spotify. MongoDB has a broader approval, being mentioned in 2189 company stacks & 2218 developers stacks; compared to Hadoop, which is listed in 237 company stacks and 127 developer stacks.

What is Hadoop?

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

What is MongoDB?

MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Why do developers choose Hadoop?
Why do developers choose MongoDB?

Sign up to add, upvote and see more prosMake informed product decisions

    Be the first to leave a con
    What companies use Hadoop?
    What companies use MongoDB?

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Hadoop?
    What tools integrate with MongoDB?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    What are some alternatives to Hadoop and MongoDB?
    Cassandra
    Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
    Elasticsearch
    Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
    Splunk
    Splunk Inc. provides the leading platform for Operational Intelligence. Customers use Splunk to search, monitor, analyze and visualize machine data.
    HBase
    Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
    MySQL
    The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.
    See all alternatives
    Decisions about Hadoop and MongoDB
    MongoDB
    MongoDB

    I starting using MongoDB because it was much easier to implement in production then hosted SQL, and found that a lot of the limitation you think of from a document store vs a relational database were overcome by connecting the application to a graphql API, making retrieval seamless. Mongos latest upgrades as well as Stitch and Mongo mobile make it a perfect fit especially if your application will be cross platform web and mobile.

    See more
    Zach Coffin
    Zach Coffin
    Software Developer · | 3 upvotes · 7.6K views
    PostgreSQL
    PostgreSQL
    MongoDB
    MongoDB

    I started using PostgreSQL because I started a job at a company that was already using it as well as MongoDB. The main difference between the two from my perspective is that postgres columns are a chore to add/remove/modify whereas you can throw whatever you want into a mongo collection. And personally I prefer the query language for postgres over that of mongo, but they both have their merits. Maybe someday I'll be a DBA and have more insight to share but for now there's my 2 cents.

    See more
    Antonio Sanchez
    Antonio Sanchez
    CEO at Kokoen GmbH · | 12 upvotes · 103.6K views
    atKokoen GmbHKokoen GmbH
    PHP
    PHP
    Laravel
    Laravel
    MySQL
    MySQL
    Go
    Go
    MongoDB
    MongoDB
    JavaScript
    JavaScript
    Node.js
    Node.js
    ExpressJS
    ExpressJS

    Back at the start of 2017, we decided to create a web-based tool for the SEO OnPage analysis of our clients' websites. We had over 2.000 websites to analyze, so we had to perform thousands of requests to get every single page from those websites, process the information and save the big amounts of data somewhere.

    Very soon we realized that the initial chosen script language and database, PHP, Laravel and MySQL, was not going to be able to cope efficiently with such a task.

    By that time, we were doing some experiments for other projects with a language we had recently get to know, Go , so we decided to get a try and code the crawler using it. It was fantastic, we could process much more data with way less CPU power and in less time. By using the concurrency abilites that the language has to offers, we could also do more Http requests in less time.

    Unfortunately, I have no comparison numbers to show about the performance differences between Go and PHP since the difference was so clear from the beginning and that we didn't feel the need to do further comparison tests nor document it. We just switched fully to Go.

    There was still a problem: despite the big amount of Data we were generating, MySQL was performing very well, but as we were adding more and more features to the software and with those features more and more different type of data to save, it was a nightmare for the database architects to structure everything correctly on the database, so it was clear what we had to do next: switch to a NoSQL database. So we switched to MongoDB, and it was also fantastic: we were expending almost zero time in thinking how to structure the Database and the performance also seemed to be better, but again, I have no comparison numbers to show due to the lack of time.

    We also decided to switch the website from PHP and Laravel to JavaScript and Node.js and ExpressJS since working with the JSON Data that we were saving now in the Database would be easier.

    As of now, we don't only use the tool intern but we also opened it for everyone to use for free: https://tool-seo.com

    See more
    Jeyabalaji Subramanian
    Jeyabalaji Subramanian
    CTO at FundsCorner · | 24 upvotes · 358.3K views
    atFundsCornerFundsCorner
    MongoDB
    MongoDB
    PostgreSQL
    PostgreSQL
    MongoDB Stitch
    MongoDB Stitch
    Node.js
    Node.js
    Amazon SQS
    Amazon SQS
    Python
    Python
    SQLAlchemy
    SQLAlchemy
    AWS Lambda
    AWS Lambda
    Zappa
    Zappa

    Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.

    We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient

    Based on the above criteria, we selected the following tools to perform the end to end data replication:

    We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.

    We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.

    In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.

    Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.

    In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!

    See more
    Khauth György
    Khauth György
    CTO at SalesAutopilot Kft. · | 12 upvotes · 115.4K views
    atSalesAutopilot Kft.SalesAutopilot Kft.
    Amazon CloudWatch
    Amazon CloudWatch
    Amazon SNS
    Amazon SNS
    Amazon CloudFront
    Amazon CloudFront
    Amazon Route 53
    Amazon Route 53
    MySQL
    MySQL
    MongoDB
    MongoDB
    Redis
    Redis
    jQuery UI
    jQuery UI
    Vue.js
    Vue.js
    Vuetify
    Vuetify