HBase vs MongoDB: What are the differences?
What is HBase? The Hadoop database, a distributed, scalable, big data store. Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
What is MongoDB? The database for giant ideas. MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
HBase and MongoDB can be categorized as "Databases" tools.
"Performance" is the primary reason why developers consider HBase over the competitors, whereas "Document-oriented storage" was stated as the key factor in picking MongoDB.
HBase and MongoDB are both open source tools. MongoDB with 16.3K GitHub stars and 4.1K forks on GitHub appears to be more popular than HBase with 2.91K GitHub stars and 2.01K GitHub forks.
According to the StackShare community, MongoDB has a broader approval, being mentioned in 2189 company stacks & 2218 developers stacks; compared to HBase, which is listed in 54 company stacks and 18 developer stacks.
What is HBase?
What is MongoDB?
Want advice about which of these to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using HBase?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
Used MongoDB as primary database. It holds trip data of NYC taxis for the year 2013. It is a huge dataset and it's primary feature is geo coordinates with pickup and drop off locations. Also used MongoDB's map reduce to process this large dataset for aggregation. This aggregated result was then used to show visualizations.
MongoDB fills our more traditional database needs. We knew we wanted Trello to be blisteringly fast. One of the coolest and most performance-obsessed teams we know is our next-door neighbor and sister company StackExchange. Talking to their dev lead David at lunch one day, I learned that even though they use SQL Server for data storage, they actually primarily store a lot of their data in a denormalized format for performance, and normalize only when they need to.
The final output is inserted into HBase to serve the experiment dashboard. We also load the output data to Redshift for ad-hoc analysis. For real-time experiment data processing, we use Storm to tail Kafka and process data in real-time and insert metrics into MySQL, so we could identify group allocation problems and send out real-time alerts and metrics.
Nearly all of our backend storage is on MongoDB. This has also worked out pretty well. It's enabled us to scale up faster/easier than if we had rolled our own solution on top of PostgreSQL (which we were using previously). There have been a few roadbumps along the way, but the team at 10gen has been a big help with thing.
We are testing out MongoDB at the moment. Currently we are only using a small EC2 setup for a delayed job queue backed by
agenda. If it works out well we might look to see where it could become a primary document storage engine for us.
Used for proofs of concept and personal projects with a document data model, especially with need for strong geographic queries. Often not chosen in long term apps due to chance data model can end up relational as needs develop.