Alternatives to Kafka logo

Alternatives to Kafka

ActiveMQ, RabbitMQ, Amazon Kinesis, Apache Spark, and Akka are the most popular alternatives and competitors to Kafka.
23.2K
21.5K
+ 1
607

What is Kafka and what are its top alternatives?

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
Kafka is a tool in the Message Queue category of a tech stack.
Kafka is an open source tool with 27K GitHub stars and 13.4K GitHub forks. Here’s a link to Kafka's open source repository on GitHub

Top Alternatives to Kafka

  • ActiveMQ
    ActiveMQ

    Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License. ...

  • RabbitMQ
    RabbitMQ

    RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received. ...

  • Amazon Kinesis
    Amazon Kinesis

    Amazon Kinesis can collect and process hundreds of gigabytes of data per second from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data. ...

  • Apache Spark
    Apache Spark

    Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. ...

  • Akka
    Akka

    Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. ...

  • Apache Storm
    Apache Storm

    Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. ...

  • Apache Flink
    Apache Flink

    Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala. ...

  • Redis
    Redis

    Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams. ...

Kafka alternatives & related posts

ActiveMQ logo

ActiveMQ

882
1.3K
77
A message broker written in Java together with a full JMS client
882
1.3K
+ 1
77
PROS OF ACTIVEMQ
  • 18
    Easy to use
  • 14
    Open source
  • 13
    Efficient
  • 10
    JMS compliant
  • 6
    High Availability
  • 5
    Scalable
  • 3
    Distributed Network of brokers
  • 3
    Persistence
  • 3
    Support XA (distributed transactions)
  • 1
    Docker delievery
  • 1
    Highly configurable
  • 0
    RabbitMQ
CONS OF ACTIVEMQ
  • 1
    ONLY Vertically Scalable
  • 1
    Support
  • 1
    Low resilience to exceptions and interruptions
  • 1
    Difficult to scale

related ActiveMQ posts

I want to choose Message Queue with the following features - Highly Available, Distributed, Scalable, Monitoring. I have RabbitMQ, ActiveMQ, Kafka and Apache RocketMQ in mind. But I am confused which one to choose.

See more
Naushad Warsi
software developer at klingelnberg · | 1 upvote · 774.4K views
Shared insights
on
ActiveMQActiveMQRabbitMQRabbitMQ

I use ActiveMQ because RabbitMQ have stopped giving the support for AMQP 1.0 or above version and the earlier version of AMQP doesn't give the functionality to support OAuth.

If OAuth is not required and we can go with AMQP 0.9 then i still recommend rabbitMq.

See more
RabbitMQ logo

RabbitMQ

21K
18.3K
527
Open source multiprotocol messaging broker
21K
18.3K
+ 1
527
PROS OF RABBITMQ
  • 234
    It's fast and it works with good metrics/monitoring
  • 79
    Ease of configuration
  • 59
    I like the admin interface
  • 50
    Easy to set-up and start with
  • 21
    Durable
  • 18
    Standard protocols
  • 18
    Intuitive work through python
  • 10
    Written primarily in Erlang
  • 8
    Simply superb
  • 6
    Completeness of messaging patterns
  • 3
    Scales to 1 million messages per second
  • 3
    Reliable
  • 2
    Distributed
  • 2
    Supports MQTT
  • 2
    Better than most traditional queue based message broker
  • 2
    Supports AMQP
  • 1
    Clusterable
  • 1
    Clear documentation with different scripting language
  • 1
    Great ui
  • 1
    Inubit Integration
  • 1
    Better routing system
  • 1
    High performance
  • 1
    Runs on Open Telecom Platform
  • 1
    Delayed messages
  • 1
    Reliability
  • 1
    Open-source
CONS OF RABBITMQ
  • 9
    Too complicated cluster/HA config and management
  • 6
    Needs Erlang runtime. Need ops good with Erlang runtime
  • 5
    Configuration must be done first, not by your code
  • 4
    Slow

related RabbitMQ posts

James Cunningham
Operations Engineer at Sentry · | 18 upvotes · 1.6M views
Shared insights
on
CeleryCeleryRabbitMQRabbitMQ
at

As Sentry runs throughout the day, there are about 50 different offline tasks that we execute—anything from “process this event, pretty please” to “send all of these cool people some emails.” There are some that we execute once a day and some that execute thousands per second.

Managing this variety requires a reliably high-throughput message-passing technology. We use Celery's RabbitMQ implementation, and we stumbled upon a great feature called Federation that allows us to partition our task queue across any number of RabbitMQ servers and gives us the confidence that, if any single server gets backlogged, others will pitch in and distribute some of the backlogged tasks to their consumers.

#MessageQueue

See more

Around the time of their Series A, Pinterest’s stack included Python and Django, with Tornado and Node.js as web servers. Memcached / Membase and Redis handled caching, with RabbitMQ handling queueing. Nginx, HAproxy and Varnish managed static-delivery and load-balancing, with persistent data storage handled by MySQL.

See more
Amazon Kinesis logo

Amazon Kinesis

786
596
9
Store and process terabytes of data each hour from hundreds of thousands of sources
786
596
+ 1
9
PROS OF AMAZON KINESIS
  • 9
    Scalable
CONS OF AMAZON KINESIS
  • 3
    Cost

related Amazon Kinesis posts

Praveen Mooli
Engineering Manager at Taylor and Francis · | 18 upvotes · 3.7M views

We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.

To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas

To build #Webapps we decided to use Angular 2 with RxJS

#Devops - GitHub , Travis CI , Terraform , Docker , Serverless

See more
John Kodumal

As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.

We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.

See more
Apache Spark logo

Apache Spark

3K
3.5K
140
Fast and general engine for large-scale data processing
3K
3.5K
+ 1
140
PROS OF APACHE SPARK
  • 61
    Open-source
  • 48
    Fast and Flexible
  • 8
    One platform for every big data problem
  • 8
    Great for distributed SQL like applications
  • 6
    Easy to install and to use
  • 3
    Works well for most Datascience usecases
  • 2
    Interactive Query
  • 2
    Machine learning libratimery, Streaming in real
  • 2
    In memory Computation
CONS OF APACHE SPARK
  • 4
    Speed

related Apache Spark posts

Eric Colson
Chief Algorithms Officer at Stitch Fix · | 21 upvotes · 6.1M views

The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

For more info:

#DataScience #DataStack #Data

See more
Conor Myhrvold
Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 2.9M views

Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

See more
Akka logo

Akka

1.2K
1K
88
Build powerful concurrent & distributed applications more easily
1.2K
1K
+ 1
88
PROS OF AKKA
  • 32
    Great concurrency model
  • 17
    Fast
  • 12
    Actor Library
  • 10
    Open source
  • 7
    Resilient
  • 5
    Message driven
  • 5
    Scalable
CONS OF AKKA
  • 3
    Mixing futures with Akka tell is difficult
  • 2
    Closing of futures
  • 2
    No type safety
  • 1
    Very difficult to refactor
  • 1
    Typed actors still not stable

related Akka posts

To solve the problem of scheduling and executing arbitrary tasks in its distributed infrastructure, PagerDuty created an open-source tool called Scheduler. Scheduler is written in Scala and uses Cassandra for task persistence. It also adds Apache Kafka to handle task queuing and partitioning, with Akka to structure the library’s concurrency.

The service’s logic schedules a task by passing it to the Scheduler’s Scala API, which serializes the task metadata and enqueues it into Kafka. Scheduler then consumes the tasks, and posts them to Cassandra to prevent data loss.

See more
Shared insights
on
AkkaAkkaKafkaKafka

I decided to use Akka instead of Kafka streams because I have personal relationships at @Lightbend.

See more
Apache Storm logo

Apache Storm

205
280
25
Distributed and fault-tolerant realtime computation
205
280
+ 1
25
PROS OF APACHE STORM
  • 10
    Flexible
  • 6
    Easy setup
  • 4
    Event Processing
  • 3
    Clojure
  • 2
    Real Time
CONS OF APACHE STORM
    Be the first to leave a con

    related Apache Storm posts

    Marc Bollinger
    Infra & Data Eng Manager at Thumbtack · | 5 upvotes · 1.8M views

    Lumosity is home to the world's largest cognitive training database, a responsibility we take seriously. For most of the company's history, our analysis of user behavior and training data has been powered by an event stream--first a simple Node.js pub/sub app, then a heavyweight Ruby app with stronger durability. Both supported decent throughput and latency, but they lacked some major features supported by existing open-source alternatives: replaying existing messages (also lacking in most message queue-based solutions), scaling out many different readers for the same stream, the ability to leverage existing solutions for reading and writing, and possibly most importantly: the ability to hire someone externally who already had expertise.

    We ultimately migrated to Kafka in early- to mid-2016, citing both industry trends in companies we'd talked to with similar durability and throughput needs, the extremely strong documentation and community. We pored over Kyle Kingsbury's Jepsen post (https://aphyr.com/posts/293-jepsen-Kafka), as well as Jay Kreps' follow-up (http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen), talked at length with Confluent folks and community members, and still wound up running parallel systems for quite a long time, but ultimately, we've been very, very happy. Understanding the internals and proper levers takes some commitment, but it's taken very little maintenance once configured. Since then, the Confluent Platform community has grown and grown; we've gone from doing most development using custom Scala consumers and producers to being 60/40 Kafka Streams/Connects.

    We originally looked into Storm / Heron , and we'd moved on from Redis pub/sub. Heron looks great, but we already had a programming model across services that was more akin to consuming a message consumers than required a topology of bolts, etc. Heron also had just come out while we were starting to migrate things, and the community momentum and direction of Kafka felt more substantial than the older Storm. If we were to start the process over again today, we might check out Pulsar , although the ecosystem is much younger.

    To find out more, read our 2017 engineering blog post about the migration!

    See more
    Apache Flink logo

    Apache Flink

    518
    857
    38
    Fast and reliable large-scale data processing engine
    518
    857
    + 1
    38
    PROS OF APACHE FLINK
    • 16
      Unified batch and stream processing
    • 8
      Easy to use streaming apis
    • 8
      Out-of-the box connector to kinesis,s3,hdfs
    • 4
      Open Source
    • 2
      Low latency
    CONS OF APACHE FLINK
      Be the first to leave a con

      related Apache Flink posts

      Surabhi Bhawsar
      Technical Architect at Pepcus · | 7 upvotes · 712.5K views
      Shared insights
      on
      KafkaKafkaApache FlinkApache Flink

      I need to build the Alert & Notification framework with the use of a scheduled program. We will analyze the events from the database table and filter events that are falling under a day timespan and send these event messages over email. Currently, we are using Kafka Pub/Sub for messaging. The customer wants us to move on Apache Flink, I am trying to understand how Apache Flink could be fit better for us.

      See more

      I have to build a data processing application with an Apache Beam stack and Apache Flink runner on an Amazon EMR cluster. I saw some instability with the process and EMR clusters that keep going down. Here, the Apache Beam application gets inputs from Kafka and sends the accumulative data streams to another Kafka topic. Any advice on how to make the process more stable?

      See more
      Redis logo

      Redis

      59.5K
      44.5K
      3.9K
      Open source (BSD licensed), in-memory data structure store
      59.5K
      44.5K
      + 1
      3.9K
      PROS OF REDIS
      • 886
        Performance
      • 542
        Super fast
      • 513
        Ease of use
      • 444
        In-memory cache
      • 324
        Advanced key-value cache
      • 194
        Open source
      • 182
        Easy to deploy
      • 164
        Stable
      • 155
        Free
      • 121
        Fast
      • 42
        High-Performance
      • 40
        High Availability
      • 35
        Data Structures
      • 32
        Very Scalable
      • 24
        Replication
      • 22
        Great community
      • 22
        Pub/Sub
      • 19
        "NoSQL" key-value data store
      • 16
        Hashes
      • 13
        Sets
      • 11
        Sorted Sets
      • 10
        NoSQL
      • 10
        Lists
      • 9
        Async replication
      • 9
        BSD licensed
      • 8
        Bitmaps
      • 8
        Integrates super easy with Sidekiq for Rails background
      • 7
        Keys with a limited time-to-live
      • 7
        Open Source
      • 6
        Lua scripting
      • 6
        Strings
      • 5
        Awesomeness for Free
      • 5
        Hyperloglogs
      • 4
        Transactions
      • 4
        Outstanding performance
      • 4
        Runs server side LUA
      • 4
        LRU eviction of keys
      • 4
        Feature Rich
      • 4
        Written in ANSI C
      • 4
        Networked
      • 3
        Data structure server
      • 3
        Performance & ease of use
      • 2
        Dont save data if no subscribers are found
      • 2
        Automatic failover
      • 2
        Easy to use
      • 2
        Temporarily kept on disk
      • 2
        Scalable
      • 2
        Existing Laravel Integration
      • 2
        Channels concept
      • 2
        Object [key/value] size each 500 MB
      • 2
        Simple
      CONS OF REDIS
      • 15
        Cannot query objects directly
      • 3
        No secondary indexes for non-numeric data types
      • 1
        No WAL

      related Redis posts

      Robert Zuber

      We use MongoDB as our primary #datastore. Mongo's approach to replica sets enables some fantastic patterns for operations like maintenance, backups, and #ETL.

      As we pull #microservices from our #monolith, we are taking the opportunity to build them with their own datastores using PostgreSQL. We also use Redis to cache data we’d never store permanently, and to rate-limit our requests to partners’ APIs (like GitHub).

      When we’re dealing with large blobs of immutable data (logs, artifacts, and test results), we store them in Amazon S3. We handle any side-effects of S3’s eventual consistency model within our own code. This ensures that we deal with user requests correctly while writes are in process.

      See more

      I'm working as one of the engineering leads in RunaHR. As our platform is a Saas, we thought It'd be good to have an API (We chose Ruby and Rails for this) and a SPA (built with React and Redux ) connected. We started the SPA with Create React App since It's pretty easy to start.

      We use Jest as the testing framework and react-testing-library to test React components. In Rails we make tests using RSpec.

      Our main database is PostgreSQL, but we also use MongoDB to store some type of data. We started to use Redis  for cache and other time sensitive operations.

      We have a couple of extra projects: One is an Employee app built with React Native and the other is an internal back office dashboard built with Next.js for the client and Python in the backend side.

      Since we have different frontend apps we have found useful to have Bit to document visual components and utils in JavaScript.

      See more