Kafka vs RocksDB

Overview

Kafka

Stacks24.2K

Followers22.3K

Votes607

GitHub Stars31.2K

Forks14.8K

RocksDB

Stacks141

Followers290

Votes11

GitHub Stars30.9K

Forks6.6K

Kafka vs RocksDB: What are the differences?

Introduction

Kafka and RocksDB are both popular technologies used for data processing and storage. However, there are several key differences between the two.

Scalability: Kafka is designed to handle high data throughput and can scale horizontally by adding more brokers to its cluster. It uses a distributed messaging system, allowing it to handle large volumes of data in real-time. On the other hand, RocksDB is an embedded key-value store that is optimized for low-latency and high-performance operations on a single machine. While Kafka excels at handling large streams of data, RocksDB is better suited for applications requiring fast key-value access.
Data Persistence: Kafka is designed for data streaming and does not provide built-in data persistence. It primarily relies on durable storage systems like RocksDB for storing messages. RocksDB, on the other hand, provides persistent storage for both key-value pairs and can handle large datasets efficiently on disk, making it suitable for applications that require long-term storage.
Message Retention: In Kafka, messages are retained for a configurable amount of time, allowing consumers to catch up on missed data. Kafka uses a log-based storage system, where messages are stored for a defined period of time or until a specific size threshold is reached. In contrast, RocksDB guarantees durability by persisting data to disk, ensuring that data is always available even after a system crash or restart.
Processing Model: Kafka provides a publish-subscribe model where producers publish messages to topics, and consumers subscribe to these topics to receive the messages. It supports both real-time stream processing using Kafka Streams and batch processing using Kafka Connect and Kafka Connectors. RocksDB, on the other hand, is primarily used as a storage engine within an application and does not provide the same level of stream processing capabilities as Kafka.
Fault Tolerance: Kafka is designed to be highly fault-tolerant and provides features such as replication and leader election to ensure data availability and reliability. It employs a distributed commit log architecture, where data is replicated across multiple brokers, ensuring that messages are not lost even in the event of a broker failure. While RocksDB can also provide fault tolerance by replicating data across multiple machines, it is typically used in single-machine setups and does not have built-in replication mechanisms.
Use Cases: Kafka is commonly used for building real-time streaming data pipelines, building event-driven architectures, and ingesting data into data lakes or analytics systems. Its ability to handle high data throughput and integrate with various processing frameworks makes it suitable for use cases like real-time analytics, log aggregation, and event sourcing. RocksDB, on the other hand, is often used as an embedded database in applications that require fast and efficient key-value access, such as caching systems, indexing engines, and distributed systems.

In summary, Kafka is a distributed data streaming platform optimized for handling high data throughput and supporting real-time stream processing, while RocksDB is an embedded key-value store optimized for low-latency operations on a single machine and providing persistent storage.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Kafka, RocksDB

viradiya

Apr 12, 2020

Needs adviceon

AngularJS

ASP.NET Core

MSSQL

We are going to develop a microservices-based application. It consists of AngularJS, ASP.NET Core, and MSSQL.

We have 3 types of microservices. Emailservice, Filemanagementservice, Filevalidationservice

I am a beginner in microservices. But I have read about RabbitMQ, but come to know that there are Redis and Kafka also in the market. So, I want to know which is best.

933k views933k

Comments

Ishfaq

Feb 28, 2020

Needs advice

Our backend application is sending some external messages to a third party application at the end of each backend (CRUD) API call (from UI) and these external messages take too much extra time (message building, processing, then sent to the third party and log success/failure), UI application has no concern to these extra third party messages.

So currently we are sending these third party messages by creating a new child thread at end of each REST API call so UI application doesn't wait for these extra third party API calls.

I want to integrate Apache Kafka for these extra third party API calls, so I can also retry on failover third party API calls in a queue(currently third party messages are sending from multiple threads at the same time which uses too much processing and resources) and logging, etc.

Question 1: Is this a use case of a message broker?

Question 2: If it is then Kafka vs RabitMQ which is the better?

804k views804k

Comments

Roman

Senior Back-End Developer, Software Architect

Feb 12, 2019

Reviewon

Kafka

I use Kafka because it has almost infinite scaleability in terms of processing events (could be scaled to process hundreds of thousands of events), great monitoring (all sorts of metrics are exposed via JMX).

Downsides of using Kafka are:

you have to deal with Zookeeper
you have to implement advanced routing yourself (compared to RabbitMQ it has no advanced routing)

10.9k views10.9k

Comments

Detailed Comparison

Kafka	RocksDB
Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.	RocksDB is an embeddable persistent key-value store for fast storage. RocksDB can also be the foundation for a client-server database but our current focus is on embedded workloads. RocksDB builds on LevelDB to be scalable to run on servers with many CPU cores, to efficiently use fast storage, to support IO-bound, in-memory and write-once workloads, and to be flexible to allow for innovation.
Written at LinkedIn in Scala;Used by LinkedIn to offload processing of all page and other views;Defaults to using persistence, uses OS disk cache for hot data (has higher throughput then any of the above having persistence enabled);Supports both on-line as off-line processing	Designed for application servers wanting to store up to a few terabytes of data on locally attached Flash drives or in RAM;Optimized for storing small to medium size key-values on fast storage -- flash devices or in-memory;Scales linearly with number of CPUs so that it works well on ARM processors
Statistics
GitHub Stars 31.2K	GitHub Stars 30.9K
GitHub Forks 14.8K	GitHub Forks 6.6K
Stacks 24.2K	Stacks 141
Followers 22.3K	Followers 290
Votes 607	Votes 11
Pros & Cons
Pros 126 High-throughput 119 Distributed 92 Scalable 86 High-Performance 66 Durable Cons 32 Non-Java clients are second-class citizens 29 Needs Zookeeper 9 Operational difficulties 5 Terrible Packaging	Pros 5 Very fast 3 Made by Facebook 2 Consistent performance 1 Ability to add logic to the database layer where needed

What are some alternatives to Kafka, RocksDB?

MongoDB

MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.

MySQL

The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.

PostgreSQL

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.

RabbitMQ

RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.

Microsoft SQL Server

Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.

SQLite

SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

Cassandra

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

MariaDB

Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry. MariaDB is designed as a drop-in replacement of MySQL(R) with more features, new storage engines, fewer bugs, and better performance.

RethinkDB

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

Related Comparisons

Kafka vs RocksDB: What are the differences?

Introduction

Kafka and RocksDB are both popular technologies used for data processing and storage. However, there are several key differences between the two.

Scalability: Kafka is designed to handle high data throughput and can scale horizontally by adding more brokers to its cluster. It uses a distributed messaging system, allowing it to handle large volumes of data in real-time. On the other hand, RocksDB is an embedded key-value store that is optimized for low-latency and high-performance operations on a single machine. While Kafka excels at handling large streams of data, RocksDB is better suited for applications requiring fast key-value access.
Data Persistence: Kafka is designed for data streaming and does not provide built-in data persistence. It primarily relies on durable storage systems like RocksDB for storing messages. RocksDB, on the other hand, provides persistent storage for both key-value pairs and can handle large datasets efficiently on disk, making it suitable for applications that require long-term storage.
Message Retention: In Kafka, messages are retained for a configurable amount of time, allowing consumers to catch up on missed data. Kafka uses a log-based storage system, where messages are stored for a defined period of time or until a specific size threshold is reached. In contrast, RocksDB guarantees durability by persisting data to disk, ensuring that data is always available even after a system crash or restart.
Processing Model: Kafka provides a publish-subscribe model where producers publish messages to topics, and consumers subscribe to these topics to receive the messages. It supports both real-time stream processing using Kafka Streams and batch processing using Kafka Connect and Kafka Connectors. RocksDB, on the other hand, is primarily used as a storage engine within an application and does not provide the same level of stream processing capabilities as Kafka.
Fault Tolerance: Kafka is designed to be highly fault-tolerant and provides features such as replication and leader election to ensure data availability and reliability. It employs a distributed commit log architecture, where data is replicated across multiple brokers, ensuring that messages are not lost even in the event of a broker failure. While RocksDB can also provide fault tolerance by replicating data across multiple machines, it is typically used in single-machine setups and does not have built-in replication mechanisms.
Use Cases: Kafka is commonly used for building real-time streaming data pipelines, building event-driven architectures, and ingesting data into data lakes or analytics systems. Its ability to handle high data throughput and integrate with various processing frameworks makes it suitable for use cases like real-time analytics, log aggregation, and event sourcing. RocksDB, on the other hand, is often used as an embedded database in applications that require fast and efficient key-value access, such as caching systems, indexing engines, and distributed systems.

Kafka vs RocksDB

Overview

Kafka vs RocksDB: What are the differences?