MongoDB vs RocksDB: What are the differences?
What is MongoDB? The database for giant ideas. MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
What is RocksDB? Embeddable persistent key-value store for fast storage, developed and maintained by Facebook Database Engineering Team. RocksDB is an embeddable persistent key-value store for fast storage. RocksDB can also be the foundation for a client-server database but our current focus is on embedded workloads. RocksDB builds on LevelDB to be scalable to run on servers with many CPU cores, to efficiently use fast storage, to support IO-bound, in-memory and write-once workloads, and to be flexible to allow for innovation.
MongoDB and RocksDB can be categorized as "Databases" tools.
"Document-oriented storage" is the top reason why over 788 developers like MongoDB, while over 2 developers mention "Very fast" as the leading cause for choosing RocksDB.
MongoDB and RocksDB are both open source tools. It seems that MongoDB with 16.3K GitHub stars and 4.1K forks on GitHub has more adoption than RocksDB with 14.3K GitHub stars and 3.12K GitHub forks.
Uber Technologies, Lyft, and Codecademy are some of the popular companies that use MongoDB, whereas RocksDB is used by Facebook, LinkedIn, and Skry, Inc.. MongoDB has a broader approval, being mentioned in 2189 company stacks & 2218 developers stacks; compared to RocksDB, which is listed in 6 company stacks and 7 developer stacks.
What is MongoDB?
What is RocksDB?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using RocksDB?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
I starting using MongoDB because it was much easier to implement in production then hosted SQL, and found that a lot of the limitation you think of from a document store vs a relational database were overcome by connecting the application to a graphql API, making retrieval seamless. Mongos latest upgrades as well as Stitch and Mongo mobile make it a perfect fit especially if your application will be cross platform web and mobile.
In my opinion PostgreSQL is totally over MongoDB - not only works with structured data & SQL & strict types, but also has excellent support for unstructured data as separate data type (you can store arbitrary JSONs - and they may be also queryable, depending on one of format's you may choose). Both writes & reads are much faster, then in Mongo. So you can get best on Document NoSQL & SQL in single database..
Formal downside of PostgreSQL is clustering scalability. There's not simple way to build distributed a cluster. However, two points:
1) You will need much more time before you need to actually scale due to PG's efficiency. And if you follow database-per-service pattern, maybe you won't need ever, cause dealing few billion records on single machine is an option for PG.
2) When you need to - you do it in a way you need, including as a part of app's logic (e.g. sharding by key, or PG-based clustering solution with strict model), scalability will be very transparent, much more obvious than Mongo's "cluster just works (but then fails)" replication.
I started using PostgreSQL because I started a job at a company that was already using it as well as MongoDB. The main difference between the two from my perspective is that postgres columns are a chore to add/remove/modify whereas you can throw whatever you want into a mongo collection. And personally I prefer the query language for postgres over that of mongo, but they both have their merits. Maybe someday I'll be a DBA and have more insight to share but for now there's my 2 cents.
Back at the start of 2017, we decided to create a web-based tool for the SEO OnPage analysis of our clients' websites. We had over 2.000 websites to analyze, so we had to perform thousands of requests to get every single page from those websites, process the information and save the big amounts of data somewhere.
Very soon we realized that the initial chosen script language and database, PHP, Laravel and MySQL, was not going to be able to cope efficiently with such a task.
By that time, we were doing some experiments for other projects with a language we had recently get to know, Go , so we decided to get a try and code the crawler using it. It was fantastic, we could process much more data with way less CPU power and in less time. By using the concurrency abilites that the language has to offers, we could also do more Http requests in less time.
Unfortunately, I have no comparison numbers to show about the performance differences between Go and PHP since the difference was so clear from the beginning and that we didn't feel the need to do further comparison tests nor document it. We just switched fully to Go.
There was still a problem: despite the big amount of Data we were generating, MySQL was performing very well, but as we were adding more and more features to the software and with those features more and more different type of data to save, it was a nightmare for the database architects to structure everything correctly on the database, so it was clear what we had to do next: switch to a NoSQL database. So we switched to MongoDB, and it was also fantastic: we were expending almost zero time in thinking how to structure the Database and the performance also seemed to be better, but again, I have no comparison numbers to show due to the lack of time.
As of now, we don't only use the tool intern but we also opened it for everyone to use for free: https://tool-seo.com
Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.
We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient
Based on the above criteria, we selected the following tools to perform the end to end data replication:
We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.
We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.
In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.
Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.
In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!