Need advice about which tool to choose?Ask the StackShare community!
HBase vs MongoDB: What are the differences?
Introduction
In this article, we will discuss the key differences between HBase and MongoDB, two popular NoSQL databases. Both HBase and MongoDB are designed to handle big data, but they have significant differences in their architecture and functionality.
Data Structure: HBase is a column-oriented database, where data is stored in column families and each column family consists of multiple columns. On the other hand, MongoDB is a document-oriented database, where data is stored in documents that are similar to JSON objects. This fundamental difference in data structure affects how data is stored, queried, and manipulated in each database.
Scalability: HBase is built on top of Hadoop and HDFS, which enables it to scale horizontally across multiple machines. It can handle very large datasets and offers automatic sharding and replication for high availability and fault tolerance. MongoDB also supports horizontal scalability, but it relies on sharding to distribute data across multiple nodes. Sharding in MongoDB requires additional configuration and management compared to HBase.
Consistency: HBase provides strong consistency guarantees, where all read and write operations are immediately consistent with the latest version of the data. On the other hand, MongoDB provides eventual consistency by default, where updates made to the database may take some time to propagate to all replicas. MongoDB offers tunable consistency options, but strong consistency comes with performance trade-offs.
Querying: HBase supports structured queries using a query language called HBase Shell, which allows filtering and aggregation based on column values. MongoDB, on the other hand, provides a rich query language that supports ad-hoc queries on any field within a document. MongoDB also supports indexing and secondary indexes for efficient querying.
Data Model: HBase is schema-less, meaning that it does not enforce any predefined structure for data storage. Each column can have a different data type, allowing flexibility in storing different types of data. MongoDB, on the other hand, is schema-less at the document level, but it enforces a schema at the collection level. Each document within a collection must conform to the same structure, but different collections within the same database can have different schemas.
Transaction Support: HBase does not provide built-in support for transactions. However, it supports atomic read-modify-write operations on individual rows. MongoDB introduces multi-document ACID transactions in version 4.0, allowing multiple operations to be grouped together as a single transaction. This provides more flexibility and ensures data consistency in complex operations involving multiple documents.
In Summary, HBase and MongoDB differ in their data structure, scalability, consistency models, querying capabilities, data model flexibility, and transaction support. Understanding these differences is crucial in choosing the appropriate database solution for specific use cases and requirements.
Hello everyone. We have a project that it's like a candidate tracking system. It has candidates, projects, assessments, etc. A consultant senior developer started it by using MongoDB. The thing is that he designed the database like it's a relational DB.
Personally, I didn't imagine that it was a good thing to do. Because you won't have the power of SQL functionalities like join
, on delete
, and more. You have to be very careful, I think things may go unmaintainable very fast. I asked him about this and he said "I don't see a problem doing it like this."
What are your thoughts on this? Did you see examples like this? Should I avoid it or go for it? Any advice is appreciated.
Here is what it looks like in Moon Modeler: https://imgur.com/a/RNwNBNY
It happened to me that you actually construct a relational schema with MongoDB. It is not good. You do not use the modeling benefits of MongoDB, and you do not have the benefits of SQL. So I recommend taking it into MySQL. Since you think in a relational way, it is best you move to MySQL
Specifically, do you need non-normalized data? If not, MySQL is best. Otherwise, MongoDB is best. If you think non-relational, you do not need joins, and the problems with cascade disappear.
What is the best way to think? If you work in terms of whole tree of related object, then you think non-relational and non-normalized.
It makes no sense if you use MongoDB primarily as a relational database. As you scale MongoDB will be more expensive than SQL and as you said without having the advantages of "join" etc.
We use MongoDB in our company. It is useful for us, as we work with different types of devices and we love the functionality of being able to add fields whenever we have a new device type etc. Mongo also allows enables easy scaling and fault tolerance. However, you will have to learn how to manage it.
If you are already comfortable with SQL and don't need NoSQL, stick to SQL. At scale, it is cheaper than Mongo.
I have been using Firebase with almost all my web projects as well as SwiftUI projects. I use it for the database as well as the user authentication via Google.
Is it good enough?? I have learned MySQL but I'm not that comfortable…
So for user authentication and database should I keep using firebase or switch to MySQL or MongoDB?? Or any other combination?
Look if you are comfortable with firebase you can go with it, after all, It's all about development and running your program bug-free and fast, but firebase is costly fo long run and if you are comfortable with that cost then I suggest you go with it.
Hi!
I’m not an expert, but I can tell you some things:
- Firebase is a great option for a very simple to implement, fast and reliable authentication method. Nonetheless, the free authentications are limited, so if you will potentially have millions of monthly authentications, it’s probably best to take the time to build it into your app directly.
- MySQL is great for simple tables where the data structures are not too complex, but it lacks some speed when you are trying to retrieve time data series. Also, I believe it’s a bit more difficult to distribute.
- MongoDB is great when your information is a bit more complex and you need very peculiar data structures, nested data, dynamic structures, etc. For me at least, it’s a bit more complex to master than MySQL, but the freedom it gives you is incredible. It also performs super fast, especially with time data series, and if I’m not wrong, it’s more scalable.
In general, almost all technologies have their good things, it’s just a matter of what you want to do and then choosing the right ones.
Doing User authentication (oauth) and session management by ourself is kind a challenging, so if possible use firebase itself since it provides these features out of the box.
I'm starting to work on a Jira-like bug tracker web app. This is a hobby project that is mostly a way for me to learn about different technologies and development processes(CI/CD, etc..) so I could be more ready when I start applying for programming jobs.
I'm debating between MySQL, which I'm less familiar with, and MongoDB which I have used in the past.
My two points of consideration are the following:
1) Which one is more likely to be relevant for web dev jobs? While I want to learn new technologies, I prefer learning ones that will make me more hireable in the future.
2) Which one is more flexible when it comes to changing the shape of the stored data? I expect to need to make some changes as the project goes on.
Thanks, everyone!
MySQL is still more popular than MongoDB if you look at Google Trends. I've also added MariaDB, which is pretty much a copy from MySQL and its features, and PostgreSQL, which is also a popular relational database.
This is a very good article for comparing MySQL to MongoDB and which one you should use: MongoDB vs MySQL: A Comparative Study on Databases.
If you just want to learn and you have the time, I would opt for using both MySQL and MongoDB. For example using MySQL for most of the site content and MongoDB for saving log messages. As you get more and more logs you start to see the benefits from MongoDB's faster document fetching.
There's really not an awful lot of difference between the two, they have wildly different storage mechanisms but they each have their fairly similar benefits. If you want to learn something that might be a requisite skill for a job, I would also look at alternatives such as time based and column based systems like InfluxDB and the unbelievably fast and flexible ClickHouse. While they may seem like an unlikely fit for a personal bug tracker app, there's no reason not to use them. Since I got into InfluxDB people have been requesting it a lot and I'll be using ClickHouse for all large databases, probably forever. Expand your horizons beyond your competition's.
I’m doing a school project where I have to design a database for a password manager app like 1Password, bitwarden… I’m not sure which database paradigms I should use. Users would have the ability to create vaults and each vault will have many items and can be sorted into favorite, category, tag list… Please help.
What I have learned through several years of experience, is that by default you should consider SQL database (like PostgreSQL, MySQL, ...) and if does not suit you then you should explore other noSQL options.
SQL is very solid and it can do almost anything and can support almost any kind of systems.
So, for your case I would recommend that you go with SQL. You should start by listing your use cases and infer from them your entities and relations, and work on them in a Top-to-bottom manner, meaning that you should have some entities that are the core dependencies for the other entities. Or, in other words, they can exist without other entities existing, but the opposite is not true, these are your core entities that you should work on first, then gradually build the other entities.
One way to figure out the core entities is to follow how the users will behave in your system, what will the user create first, and what is dependant on other entities.
For example, in your case, on way to do it is to start with the "vault", as everything else cannot exist without it (and it's the first thing a user would create), then do passwords as they depend on the vaults (I would say passwords are "under" the vault), then once you do them, you can start working on tags then categories, and so on...
I am researching different querying solutions to handle ~1 trillion records of data (in the realm of a petabyte). The data is mostly textual. I have identified a few options: Milvus, HBase, RocksDB, and Elasticsearch. I was wondering if there is a good way to compare the performance of these options (or if anyone has already done something like this). I want to be able to compare the speed of ingesting and querying textual data from these tools. Does anyone have information on this or know where I can find some? Thanks in advance!
You've probably come to a decision already but for those reading...here are some resources we put together to help people learn more about Milvus and other databases https://zilliz.com/comparison and https://github.com/zilliztech/VectorDBBench. I don't think they include RocksDB or HBase yet (you could could recommend on GitHub) but hopefully they help answer your Elastic Search questions.
I have a project (in production) that a part of it is generating HTML from JSON object normally we use Microsoft SQL Server only as our main database. but when it comes to this part some team members suggest working with a NoSQL database as we are going to handle JSON data for both retrieval and querying. others replied that will add complexity and we will lose SQL Servers' Unit Of Work which will break the Atomic behavior, and they suggest to continue working with SQL Server since it supports working with JSON. If you have practical experience using JSON with SQL Server, kindly share your feedback.
I agree with the advice you have been given to stick with SQL Server. If you are on the latest SQL Server version you can query inside the JSON field. You should set up a test database with a JSON field and try some queries. Once you understand it and can demonstrate it, show it to the other developers that are suggesting MongoDB. Once they see it working with their own eyes they may drop their position of Mongo over SQL. I would only seriously consider MongoDB if there was no other SQL requirements. I wouldn't do both. I'd be all SQL or all Mongo.
I think the key thing to look for is what kind of queries you're expecting to do on that JSON and how stable that data is going to be. (And if you actually need to store the data as JSON; it's generally pretty inexpensive to generate a JSON object)
MongoDB gets rid of the relational aspect of data in favor of data being very fluid in structure.
So if your JSON is going to vary a lot/is unpredictable/will change over time and you need to run queries efficiently like 'records where the field x exists and its value is higher than 3', that's a great use case for MongoDB.
It's hard to solve this in a standard relational model: Indexing on a single column that has wildly different values is pretty much impossible to do efficiently; and pulling out the data in its own columns is hard because it's hard to predict how many columns you'd have or what their datatypes would be. If this sounds like your predicament, 100% go for MongoDB.
If this is always going to be more or less the same JSON and the fields are going to be predictably the same, then the fact that it's JSON doesn't particularly matter much. Your indexes are going to approach it similar to a long string.
If the queried fields are very predictable, you should probably consider storing the fields as separate columns to have better querying capabilities. Ie if you have {"x":1, "y":2}, {"x":5, "y":6}, {"x":9, "y":0} - just make a table with an x and y column and generate the JSON. The CPU hit is worth it compared to the querying capabilities.
Hey everyone, My users love Microsoft Excel, and so do I. I've been making tools for them in the form of workbooks for years, these tools usually have databases included in the spreadsheets or communicate to free APIs around the web, but now I want to distribute these tools in the form of Excel Add-ins for several reasons.
I want these Add-ins to communicate to a personal server to authorize users, read from my databases, and write to them while they're using their Excel environment. I have never built a website, so what would be a good solution for this, considering I'm new to all of these technologies? I know about the existence of Microsoft Azure, Microsoft SharePoint, and Google Sheets, but I don't know how to feel about those.
Just definitely don't use firebase. All of MongoDB, MySQL, MariaDB and PostGreSQL have a lot of community support and history.
Snowflake is a NoSQL database in the cloud, which also accepts SQL calls. Users can obtain an ODBC driver for SnowFlake, which would allow your Excel apps to write/read from the backend, locally.
Hey, we want to build a referral campaign mechanism that will probably contain millions of records within the next few years. We want fast read access based on IDs or some indexes, and isolation is crucial as some listeners will try to update the same document at the same time. What's your suggestion between Couchbase and MongoDB? Thanks!
I am biased (work for Scylla) but it sounds like a KV/wide column would be better in this use case. Document/schema free/lite DBs data stores are easier to get up and running on but are not as scalable (generally) as NoSQL flavors that require a more rigid data model like ScyllaDB. If your data volumes are going to be 10s of TB and transactions per sec 10s of 1000s (or more), look at Scylla. We have something called lightweight transactions (LWT) that can get you consistency.
I have found MongoDB highly consistent and highly available. It suits your needs. We usually trade off partion tolerance fot this. Having said that, I am little biased in recommendation as I haven't had much experience with couchbase on production.
I'm planning to build a freelance marketplace website, using tools like Next.js, Firebase Authentication, Node.js, but I need to know which type of database is suitable with performance and powerful features. I'm trying to figure out what the best stack is for this project. If anyone has advice please, I’d love to hear more details. Thanks.
Postgres and MySQL are very similar, but Mongo has differences in terms of storage type and the CAP theorem. For your requirement, I prefer Postgres (or MySQL) over MongoDB. Mongo gives you no schema which is not always good. on the other hand, it is more common in NodeJS community, so you may find more articles about Node-Mongo stuff. I suggest to stay with RDBMS if possible.
This is a little about experience. Postgresql is fine. You can use either the related table structure or the json table structure.
We have a ready-made engine for the online exchange and marketplace. To customize it, you only need to know sql. Connecting any database is not a problem. https://falconspace.site/list/solutions
For learning purposes, I am trying to design a dashboard that displays the total revenue from all connected webshops/marketplaces, displaying incoming orders, total orders, etc.
So I will need to get the data (using Node backend) from the Shopify and marketplace APIs, storing this in the database, and get the data from the back end.
My question is:
What kind of database should I use? Is MongoDB fine for storing this kind of data? Or should I go with a SQL database?
Postgres is a solid database with a promising background. In the relational side of database design, I see Postgres as an absolute; Now the arguments and conflicts come in when talking about NoSQL data types. The truth is jsonb in Postgres is efficient and gives a good performance and storage. In a comparison with MongoDB with the same resources (such as RAM and CPU) with better tools and community, I think you should go for Postgres and use jsonb for some of the data. All in all, don't use a NoSQL database just cause you have the data type matching this tech, have both SQL and NoSQL at the same time.
I have found MongoDB easier to work with. Postgres and SQL in general, in my experience, is harder to work with. While Postgres does provide data consistency, MongoDB provides flexibility. I've found the MongoDB ecosystem to be really great with a good community. I've worked with MongoDB in production and it's been great. I really like the aggregation system and using query operators such as $in, $pull, $push.
While my opinion may be unpopular, I have found MongoDB really great for relational data, using aggregations from a code perspective. In general, data types are also more flexible with MongoDB.
I will use PostgreSQL because you have more powerfull feature for data agregation and views (the raw data from shopify and others could be stored as is) and then use views to produce diff. kind of reports unless you wanna create those aggregations/views in nodejs code. HTH
Fauna is a serverless database where you store data as JSON. Also, you have build in a HTTP GraphQL interface with a full authentication & authorization layer. That means you can skip your Backend and call it directly from the Frontend. With the power, that you can write data transformation function within Fauna with her own language called FQL, we're getting a blazing fast application.
Also, Fauna takes care about scaling and backups (All data are sharded on three different locations on the globe). That means we can fully focus on writing business logic and don't have to worry anymore about infrastructure.
I’m newbie I was developing a pouchdb and couchdb app cause if the sync. Lots of learning very little code available. I dropped the project cause it consumed my life. Yeats later I’m back into it. I researched other db and came across rethinkdb and mongo for the subscription features. With socketio I should be able to create and similar sync feature. Attempted to use mongo. I attempted to use rethink. Rethink for the win. Super clear l. I had it running in minutes on my local machine and I believe it’s supposed to scale easy. Mongo wasn’t as easy and there free online db is so slow what’s the point. Very easy to find mongo code examples and use rethink code in its place. I wish I went this route years ago. All that corporate google Amazon crap get bent. The reason they have so much power in the world is cause you guys are giving it to them.
We started using PostgreSQL because there's no need to upgrade to an enterprise plan to access certain essential features. Postgres is essentially plug-and-play; you download it, install it, and there you go!
Another benefit of using Postgres is that you get to use SQL (Structured Query Language)—which isn't for everyone, but I enjoy how flexible and versatile it is.
Postgres also has point-in-time recovery, which you can export wherever you want—This means you can restore data from any given point in time. With this in mind, if you delete something accidentally, you can go back in time and grab said data without restoring the whole database.
Not to mention Postgres is remarkably fast with several thorough benchmarks comparing it to MongoDB, where Postgres mostly came out on top.
All the benefits of relational joins and constraints, with JSON field types in Postgres to allow for flexibility like mongo. Objection ORM makes query building seamless and abstracts away a lot of complexity of SQL queries.
MongoDB tends to get slow with scale and requires a lot of code to maintain consistency across collections as foreign keys and other constraints are harder to implement. PostgreSQL also has a vibrant community with battle tested stability and horizontal scalability when needed.
MongoDB's document-oriented paradigm is nicely suited to the results of our ML model. We felt that this compatibility offered some time savings on figuring out and implementing an extensive data formatting and processing system. MongoDB's flexible schemas schemas (due to it being non-relational) were also attractive as a source of additional agility for our development process. The MongoDB ecosystem also has great GUI tools to simplify testing.
At Pushnami we were looking at several alternative databases that would support following architectural requirements: - very quick prototyping for an unknown domain - ability to support large amounts of data - native ability to replicate and fail over - full stack approach for Node.js development After careful consideration MongoDB came on top, and 3 years later we are still very happy with that decision. Currently we keep almost 2TB of data in our cluster, and start thinking about sharding.
After using couchbase for over 4 years, we migrated to MongoDB and that was the best decision ever! I'm very disappointed with Couchbase's technical performance. Even though we received enterprise support and were a listed Couchbase Partner, the experience was horrible. With every contact, the sales team was trying to get me on a $7k+ license for access to features all other open source NoSQL databases get for free.
Here's why you should not use Couchbase
Full-text search Queries The full-text search often returns a different number of results if you run the same query multiple types
N1QL queries Configuring the indexes correctly is next to impossible. It's poorly documented and nobody seems to know what to do, even the Couchbase support engineers have no clue what they are doing.
Community support I posted several problems on the forum and I never once received a useful answer
Enterprise support It's very expensive. $7k+. The team constantly tried to get me to buy even though the community edition wasn't working great
Autonomous Operator It's actually just a poorly configured Kubernetes role that no matter what I did, I couldn't get it to work. The support team was useless. Same lack of documentation. If you do get it to work, you need 6 servers at least to meet their minimum requirements.
Couchbase cloud Typical for Couchbase, the user experience is awful and I could never get it to work.
Minimum requirements
The minimum requirements in production are 6 servers. On AWS the calculated monthly cost would be ~$600
. We achieved better performance using a $16
MongoDB instance on the Mongo Atlas Cloud
writing queries is a nightmare While N1QL is similar to SQL and it's easier to write because of the familiarity, that isn't entirely true. The "smart index" that Couchbase advertises is not smart at all. Creating an index with 5 fields, and only using 4 of them won't result in Couchbase using the same index, so you have to create a new one.
Couchbase UI
The UI that comes with every database deployment is full of bugs, barely functional and the developer experience is poor. When I asked Couchbase about it, they basically said they don't care because real developers use SQL directly from code
Consumes too much RAM
Couchbase is shipped with a smaller Memcached instance to handle the in-memory cache. Memcached ends up using 8 GB of RAM for 5000 documents
! I'm not kidding! We had less than 5000 docs on a Couchbase instance and less than 20 indexes and RAM consumption was always over 8 GB
Memory allocations are useless I asked the Couchbase team a question: If a bucket has 1 GB allocated, what happens when I have more than 1GB stored? Does it overflow? Does it cache somewhere? Do I get an error? I always received the same answer: If you buy the Couchbase enterprise then we can guide you.
We actually use both Mongo and SQL databases in production. Mongo excels in both speed and developer friendliness when it comes to geospatial data and queries on the geospatial data, but we also like ACID compliance hence most of our other data (except on-site logs) are stored in a SQL Database (MariaDB for now)
MySQL has a lot of strengths working for it. It's simple and easy to set up and use. It's JSON engine is also really good these days. Mongo is also simple to setup and use, and it's speed as a document-object storage engine is first class.
Where Postgres has both beat is in it's combining of all of the features that make both MySQL and Mongo great, while adding on enterprise grade level scalability and replication. It's Postgres' stability and robustness, while still fulfilling the roles of it's contemporaries extremely well that edge Postgre for me.
When I was new with web development, I was using PHP for backend and MySQL for database. But after improving my JS skills, I chosen Node.js. Because of too many reasons including npm, express, community, fast coding and etc. MongoDB is so good for using with Node.js. If your JS skills are enough good, I recommend to migrate to Node.js and MongoDB.
Pros of HBase
- Performance9
- OLTP5
- Fast Point Queries1
Pros of MongoDB
- Document-oriented storage827
- No sql593
- Ease of use553
- Fast464
- High performance410
- Free255
- Open source218
- Flexible180
- Replication & high availability145
- Easy to maintain112
- Querying42
- Easy scalability39
- Auto-sharding38
- High availability37
- Map/reduce31
- Document database27
- Easy setup25
- Full index support25
- Reliable16
- Fast in-place updates15
- Agile programming, flexible, fast14
- No database migrations12
- Easy integration with Node.Js8
- Enterprise8
- Enterprise Support6
- Great NoSQL DB5
- Support for many languages through different drivers4
- Schemaless3
- Aggregation Framework3
- Drivers support is good3
- Fast2
- Managed service2
- Easy to Scale2
- Awesome2
- Consistent2
- Good GUI1
- Acid Compliant1
Sign up to add or upvote prosMake informed product decisions
Cons of HBase
Cons of MongoDB
- Very slowly for connected models that require joins6
- Not acid compliant3
- Proprietary query language2