Need advice about which tool to choose?Ask the StackShare community!
RocksDB vs SQLite: What are the differences?
Introduction
This Markdown code provides a comparison between RocksDB and SQLite, highlighting their key differences.
Storage Architecture: RocksDB is an embedded key-value storage engine optimized for solid-state drives (SSDs), making it ideal for write-intensive workloads and high-performance applications. SQLite, on the other hand, is a serverless database engine that stores data in a self-contained, zero-configuration, single-file format.
Concurrency Control: RocksDB leverages a multi-threaded, lock-free design that allows concurrent read and write operations on separate keys. It offers high parallelism, making it suitable for modern multi-core processors. SQLite, however, follows a single-writer, multiple-reader (SWMR) model where write transactions lock the entire database, limiting concurrent write access.
Data Durability: RocksDB provides configurable durability options, allowing users to control the trade-off between write performance and data safety. It supports various sync modes, such as synchronous writes to disk or asynchronous writes with write-ahead logging. SQLite emphasizes durability by committing transactions immediately, ensuring data consistency and integrity.
SQL Support: RocksDB is an embedded library that does not natively support SQL. It does not have a query language or built-in support for relational operations like joins or complex data manipulation. SQLite, on the other hand, is a full-fledged relational database management system (RDBMS) that provides extensive SQL support, including complex queries, joins, and table operations.
Scalability: RocksDB is designed to scale horizontally by sharding data across multiple instances or partitions. It can handle large datasets efficiently by distributing the load. SQLite, being a single-file database, does not have built-in support for horizontal scalability. It is best suited for small to medium-sized applications that do not require distributed computing.
Deployment and Integration: RocksDB requires developers to directly integrate its library with their applications, providing flexibility but requiring more manual configuration and setup. SQLite, on the other hand, provides a self-contained library that can be easily deployed without additional dependencies or external installations. It seamlessly integrates with various programming languages and frameworks.
In summary, RocksDB is an embedded key-value storage engine optimized for SSDs, offering high performance, concurrency, and scalability, but lacking native SQL support. SQLite, on the other hand, is a serverless database engine with transactional durability, extensive SQL capabilities, and easy deployment, but limited scalability.
I am researching different querying solutions to handle ~1 trillion records of data (in the realm of a petabyte). The data is mostly textual. I have identified a few options: Milvus, HBase, RocksDB, and Elasticsearch. I was wondering if there is a good way to compare the performance of these options (or if anyone has already done something like this). I want to be able to compare the speed of ingesting and querying textual data from these tools. Does anyone have information on this or know where I can find some? Thanks in advance!
You've probably come to a decision already but for those reading...here are some resources we put together to help people learn more about Milvus and other databases https://zilliz.com/comparison and https://github.com/zilliztech/VectorDBBench. I don't think they include RocksDB or HBase yet (you could could recommend on GitHub) but hopefully they help answer your Elastic Search questions.
I need to add a DBMS to my stack, but I don't know which. I'm tempted to learn SQLite since it would be useful to me with its focus on local access without concurrency. However, doing so feels like I would be defeating the purpose of trying to expand my skill set since it seems like most enterprise applications have the opposite requirements.
To be able to apply what I learn to more projects, what should I try to learn? MySQL? PostgreSQL? Something else? Is there a comfortable middle ground between high applicability and ease of use?
You can easily start with SQlite. Really easy to startup since it doesn't require you to install any additional software since is self-contained. It has interfaces in almost any language and also GUIs. Start learning SQL basics and simpler data models and structures. There are many tutorials, also available in the official website. From there you will easily migrate to another database. MySQL could be next, sonce it's easier to learn at first and has more resources available. PostgreSQL is less widespread, more challenging and has the fewer resorces, but once you have some experience with MySQL is really easy to learn as well. All these technologies are really widespread and used accross the industry so you won't make a wrong decision with any of these.
A question you might want to think about is "What kind of experience do I want to gain, by using a DBMS?". If your aim is to have experience with SQL and any related libraries and frameworks for your language of choice (python, I think?), then it kind of doesn't matter too much which you pick so much. As others have said, SQLite would offer you the ability to very easily get started, and would give you a reasonably standard (if a little basic) SQL dialect to work with.
If your aim is actually to have a bit of "operational" experience, in terms of things like what command line tools might be available as standard for the DBMS, understanding how the DBMS handles multiple databases, when to use multiple schemas vs multiple databases, some basic privilege management etc. Then I would recommend PostgreSQL. SQLite's simplicity actually avoids most of these experiences, which is not helpful to you if that is what you hope to learn. MySQL has a few "quirks" to how it manages things like multiple databases, which may lead you to making less good decisions if you tried to take your experience over to different DBMS, especially in bigger enterprise roles. PostgreSQL is kind of a happy middle ground here, with the ability to start PostgreSQL servers via docker or docker-compose making the actual day-to-day management pretty easy, while still giving you experience of the kinds of considerations I have listed above.
At Vital Beats we make use of PostgreSQL, largely because it offers us a happy balance between good management and backup of data, and good standard command line tools, which is essential for us where we are deploying our solutions within Kubernetes / docker, and so more graphical tools are not always appropriate for us. PostgreSQL is also pretty universally supported in terms of language libraries and frameworks, without having to make compromises on how we want to store and layout our data.
MySQL's very popular, easy to install, is also available as a managed service across most popular cloud offerings. The support/default tooling (such as MySQL Query Workbench) certainly is a little more baked than what you'll find for Postgres.
Hi everyone! I am a high school student, starting a massive project. I'm building a system for a boarding school to be better connected to their students and be more efficient with information. In the meantime, I am developing a website and an android app. What's the best datastore I can use? I need to be able to access student data on the app from the main database and send push notifications. Also feed updates. What's the best approach? What's the best tool I can use to deploy the website and the database? One for testing and prototyping, and an official one... Thanks in advance!!!!
Firebase has Android, iOS, and Web SDKs; and a console where you can develop, manage, and monitor all the data and analytics from one place. Firebase real-time database is good for online presence and instant feed updates, while Firebase Firestone is good for user profile and other relational data records. Firebase has a UI SDK which makes it easy to interface with the resources in the project, and with tons of tutorials and starter projects it should be easy to quickly have a decent prototype to iterate upon. Since you said Massive, use their pricing calculator to figure if your expected scale will be covered by the free quota or if you go for the pay-as-you-go that the price is reasonable for your project.
Good luck with the project!
It sounds like a server-client relationship (central database) and while SQLite is probably the simplest, note that its performance is probably the worst of the top 20 or so choices you have. It is different from Firebase and MySQL (and most other databases) in that it is embedded in the product, although it could be embedded in your server itself.
MySQL would require a separate MySQL db server, which means either two servers (one for MySQL, and one to provide your specific services to your client app) or both running on a single server machine. There are many alternatives in the same category as MySQL, and a choice of relational databases or document (NoSQL) databases. But architecturally, they are in the same category as MySQL, a separate db server that your application server would get its data from.
Firebase is different yet again, in that it is a service that is already hosted by a company, providing many integrated features such as authentication and storage of user account info. However it does take care of many of the concerns with running a server, such as performance, scalability and management. There are some negatives that you should be aware of though: any investment of time and coding with Firebase is pretty much non-portable, in that you are stuck with Firebase going forward. If you needed to switch to a different service, not only would it be a different API, but it would be a different architecture and much of your coding would need to be discarded. Second, it's owned and run by Google now, so you have a large corporation backing it, but that also means they could decide to discontinue it without any real effect on the Google bottom line. Also some folks would have concerns with storing data on Google servers. That said, I think if you are aware of these in advance, and especially if you are a high school student, that Firebase is a fairly easy winner here. The server is already set up for you, the documentation is very complete and rich, with lots of examples, and Google is not going away. The main concern would be if it really is massive, there could be a rising cost to the service. I suspect though that it is not massive, even if everyone in a school used it. The number of concurrent connections would not be huge (probably not even into the hundreds, even if there are thousands of users).
I'd go with Firebase even though you will need to learn their API, because you'll need to learn something one way or another. SQLite is a bit of a toy database, and MySQL is a real one but you (or someone) would need to manage that server on top of needing to develop the server and client app. With Firebase, much of the server already exists, including a professionally hosted database. There are tons of high-level features provided and initial cost is somewhere between very low and zero.
Part of this is dependent on what language you want to write this in. Javascript for a cross-platform client app (I'd use Vue.js + Vuetify for UI, and provide it as a web app and optionally wrap that with Electron for a desktop app, Apache Cordova for mobile). Server could be Javascript with an Express-based REST API on Node.js, talking to Firebase for services.
If you were a Java developer though, all this goes out the window and I'd recommend a simple Java server with Javalin for REST API, and embedded ObjectDB for database storage (combined into one server). ObjectDB is very very fast and can be separated out into a scalable server if this became truly massive. But you would probably never need to go that far.
All of this is a lot of work. I hope this isn't for something like an assignment. It is in the order of 6 months of work if you know what you're doing, all year if you're learning as you go.
Don't think you can go wrong with MySQL or postgresql. python+postgres is VERY well supported stack and can do almost anything. Great visualization and administrative tools for both. There are some data-mismatch problems, however.. node.js/python with mongodb is a bit more modern and makes it trivial to "serialize" data with sprinklings of indexes. If you're using go-lang, then RocksDB is a great high-performance data-modeling base (it's not relational how-ever) It's more like a building-block for key-value store. But it's ACID so you CAN build relational systems on top. I've used LevelDB for other projects (Java/C) (similar architecture and works great on android - chrome uses it for it's metadata-storage). Rock/Level can achieve multi-million writes on cheap hardware thanks to it's trade-offs.
I'm very familiar with SQLite.. Personally my least favorite, but it's the most portable database format, and it does support ACID.. I have many gripes, but biggest issue is parallel access (you really need a single process/thread to own the data-model, then use IPC to communicate with your process/thread).. (same could be said for LevelDB, but that's so efficient, it's almost never an issue).
If your'e using Java, then JavaDB/DerbyDB/HSQLDB are EXCELLENT systems.. highly multi-threaded, good stand-alone tools. (embedded or TCP-connected). Perfect for unit-tests. Can use simple dumb portable formats (e.g. text-file containing only inserts) all the way to classic journaled binary B-tree formats to pure-in-memory. Java has a lot of overhead, so this is only really viable if you're already using Java in your project.
For high performance "memsql" is mysql API to a hybrid in-memory index + on-disk column-database (feels like classic SQL to you though). Falls into the mysql-swiss-army-knife tool-kit.
Similarly with in-memory there is "redis".. Absolutely a joy to work with. It too is a specialty swiss army knife. Steer clear of redis for primary data that you can't lose.. while redis does support persisting data, it isn't very efficient and will become the bottleneck. redis is great for micro-queue's, topics, stat-aggregators, message-repositories (password-management systems, where writes are rare so persistance is viable). Plus I love that redis uses a pure-text protocol so I can netcat or telnet directly into it and do stuff.
I've loved cloud-data-stores.. Amazon "DynamoDB" or Google BigTable are awesome!!! Cheap compared to normal hosting fees of an AWS EC2 instance.. You can play all day.. put a terabyte up, then blow it away.. pay for what you play with. It's a very very different data-model though.. They give you a very very few set of tricks that let you do complex data-modeling - and you have to be clever and have enough foresight to not block yourself into a hole (or have customer abuse expensive queries).
Then there's Cassandra/Hadoop (HBase). These are petabyte scale databases (technically so is Dynamo/BigTable). They're incredibly efficient at what they do. And they have a lot of plugins to do almost anything you need. I personally love these the best (and RocksDB/LevelDB are like their infant children offspring). You can run these on your laptop (unlike Amazon/Google engines above). But their discipline is very different than all the other's above.
Backend:
- Considering that our main app functionality involves data processing, we chose
Python
as the programming language because it offers many powerful math libraries for data-related tasks. We will useFlask
for the server due to its good integration with Python. We will use a relational database because it has good performance and we are mostly dealing with CSV files that have a fixed structure. We originally choseSQLite
, but after realizing the limitations of file-based databases, we decided to switch toPostgreSQL
, which has better compatibility with our hosting service,Heroku
.
Pros of RocksDB
- Very fast5
- Made by Facebook3
- Consistent performance2
- Ability to add logic to the database layer where needed1
Pros of SQLite
- Lightweight163
- Portable135
- Simple122
- Sql81
- Preinstalled on iOS and Android29
- Free2
- Tcl integration2
- Portable A database on my USB 'love it'1
Sign up to add or upvote prosMake informed product decisions
Cons of RocksDB
Cons of SQLite
- Not for multi-process of multithreaded apps2
- Needs different binaries for each platform1