Kafka vs Serverless

Overview

Kafka

Stacks24.2K

Followers22.3K

Votes607

GitHub Stars31.2K

Forks14.8K

Serverless

Stacks2.2K

Followers1.2K

Votes28

GitHub Stars46.9K

Forks5.7K

Kafka vs Serverless: What are the differences?

Introduction

Kafka and Serverless are two popular technologies used in the field of software development. While Kafka focuses on the efficient and reliable processing of streaming data, Serverless allows developers to build and run applications without worrying about managing the infrastructure. Let's explore the key differences between Kafka and Serverless.

Scalability: Kafka provides a distributed architecture that allows it to scale horizontally across multiple servers, making it capable of handling large amounts of data and high workloads. On the other hand, Serverless platforms dynamically scale resources based on demand, automatically adjusting to handle varying levels of traffic and workload.
Event-driven vs Function-as-a-Service: Kafka is designed as an event-driven platform, where events or messages are produced and consumed by applications or services. It acts as a real-time streaming platform, allowing efficient data processing and streaming analytics. In contrast, Serverless platforms, such as AWS Lambda, focus on executing functions in response to specific events or triggers, without requiring the developer to manage the server infrastructure.
Data Storage and Processing: Kafka provides fault-tolerant and durable storage of data, allowing consumers to rewind and replay data streams. It also supports various data processing techniques like batch processing, real-time processing, and complex event processing. Serverless platforms primarily focus on executing functions and do not provide built-in storage capabilities. However, they can integrate with external storage systems such as databases and object storage.
Infrastructure Management: Kafka requires setting up and managing a cluster of servers, including the configuration, monitoring, and maintenance of the infrastructure. In contrast, Serverless platforms handle the infrastructure management, allowing developers to focus solely on writing and deploying code. This eliminates the need to provision and manage servers, making it easier to develop and deploy applications.
Cost Model: Kafka's cost depends on the infrastructure required to set up and maintain the Kafka cluster. It involves costs associated with hardware, networking, and maintenance. Serverless platforms follow a pay-as-you-go model, where costs are incurred only when functions are invoked. Developers only pay for the actual execution time and resources used, making it a more cost-effective option for applications with sporadic or unpredictable workloads.
Latency and Real-time processing: Kafka provides low-latency processing, making it suitable for real-time data streaming and analytics. With Kafka, applications can process and react to events as they happen. Serverless platforms introduce some latency due to the underlying infrastructure required to execute functions. While the latency might be negligible for most use cases, it can impact applications that require immediate processing and response.

In summary, Kafka is a distributed event-driven streaming platform with scalable data storage and processing capabilities, while Serverless platforms focus on executing functions in response to specific events without managing the underlying infrastructure. Kafka provides low-latency real-time processing, whereas Serverless platforms offer ease of infrastructure management and cost efficiencies based on demand-based scaling.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Kafka, Serverless

Tim

CTO at Checkly Inc.

Sep 18, 2019

Needs adviceon

Heroku

AWS Lambda

When adding a new feature to Checkly rearchitecting some older piece, I tend to pick Heroku for rolling it out. But not always, because sometimes I pick AWS Lambda . The short story:

Developer Experience trumps everything.
AWS Lambda is cheap. Up to a limit though. This impact not only your wallet.
If you need geographic spread, AWS is lonely at the top.

The setup

Recently, I was doing a brainstorm at a startup here in Berlin on the future of their infrastructure. They were ready to move on from their initial, almost 100% Ec2 + Chef based setup. Everything was on the table. But we crossed out a lot quite quickly:

Pure, uncut, self hosted Kubernetes — way too much complexity
Managed Kubernetes in various flavors — still too much complexity
Zeit — Maybe, but no Docker support
Elastic Beanstalk — Maybe, bit old but does the job
Heroku
Lambda

It became clear a mix of PaaS and FaaS was the way to go. What a surprise! That is exactly what I use for Checkly! But when do you pick which model?

I chopped that question up into the following categories:

Developer Experience / DX 🤓
Ops Experience / OX 🐂 (?)
Cost 💵
Lock in 🔐

Read the full post linked below for all details

357k views357k

Comments

viradiya

Apr 12, 2020

Needs adviceon

AngularJS

ASP.NET Core

MSSQL

We are going to develop a microservices-based application. It consists of AngularJS, ASP.NET Core, and MSSQL.

We have 3 types of microservices. Emailservice, Filemanagementservice, Filevalidationservice

I am a beginner in microservices. But I have read about RabbitMQ, but come to know that there are Redis and Kafka also in the market. So, I want to know which is best.

934k views934k

Comments

Ishfaq

Feb 28, 2020

Needs advice

Our backend application is sending some external messages to a third party application at the end of each backend (CRUD) API call (from UI) and these external messages take too much extra time (message building, processing, then sent to the third party and log success/failure), UI application has no concern to these extra third party messages.

So currently we are sending these third party messages by creating a new child thread at end of each REST API call so UI application doesn't wait for these extra third party API calls.

I want to integrate Apache Kafka for these extra third party API calls, so I can also retry on failover third party API calls in a queue(currently third party messages are sending from multiple threads at the same time which uses too much processing and resources) and logging, etc.

Question 1: Is this a use case of a message broker?

Question 2: If it is then Kafka vs RabitMQ which is the better?

804k views804k

Comments

Detailed Comparison

Kafka	Serverless
Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.	Build applications comprised of microservices that run in response to events, auto-scale for you, and only charge you when they run. This lowers the total cost of maintaining your apps, enabling you to build more logic, faster. The Framework uses new event-driven compute services, like AWS Lambda, Google CloudFunctions, and more.
Written at LinkedIn in Scala;Used by LinkedIn to offload processing of all page and other views;Defaults to using persistence, uses OS disk cache for hot data (has higher throughput then any of the above having persistence enabled);Supports both on-line as off-line processing	-
Statistics
GitHub Stars 31.2K	GitHub Stars 46.9K
GitHub Forks 14.8K	GitHub Forks 5.7K
Stacks 24.2K	Stacks 2.2K
Followers 22.3K	Followers 1.2K
Votes 607	Votes 28
Pros & Cons
Pros 126 High-throughput 119 Distributed 92 Scalable 86 High-Performance 66 Durable Cons 32 Non-Java clients are second-class citizens 29 Needs Zookeeper 9 Operational difficulties 5 Terrible Packaging	Pros 14 API integration 7 Supports cloud functions for Google, Azure, and IBM 3 Lower cost 1 Openwhisk 1 Auto scale
Integrations
No integrations available	Azure Functions AWS Lambda Amazon API Gateway

What are some alternatives to Kafka, Serverless?

RabbitMQ

RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.

AWS Lambda

AWS Lambda is a compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and security.

Celery

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

Amazon SQS

Transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available. With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use.

NSQ

NSQ is a realtime distributed messaging platform designed to operate at scale, handling billions of messages per day. It promotes distributed and decentralized topologies without single points of failure, enabling fault tolerance and high availability coupled with a reliable message delivery guarantee. See features & guarantees.

ActiveMQ

Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License.

ZeroMQ

The 0MQ lightweight messaging kernel is a library which extends the standard socket interfaces with features traditionally provided by specialised messaging middleware products. 0MQ sockets provide an abstraction of asynchronous message queues, multiple messaging patterns, message filtering (subscriptions), seamless access to multiple transport protocols and more.

Apache NiFi

An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Azure Functions

Azure Functions is an event driven, compute-on-demand experience that extends the existing Azure application platform with capabilities to implement code triggered by events occurring in virtually any Azure or 3rd party service as well as on-premises systems.

Google Cloud Run

A managed compute platform that enables you to run stateless containers that are invocable via HTTP requests. It's serverless by abstracting away all infrastructure management.

Related Comparisons

Kafka vs Serverless: What are the differences?

Introduction

Scalability: Kafka provides a distributed architecture that allows it to scale horizontally across multiple servers, making it capable of handling large amounts of data and high workloads. On the other hand, Serverless platforms dynamically scale resources based on demand, automatically adjusting to handle varying levels of traffic and workload.
Event-driven vs Function-as-a-Service: Kafka is designed as an event-driven platform, where events or messages are produced and consumed by applications or services. It acts as a real-time streaming platform, allowing efficient data processing and streaming analytics. In contrast, Serverless platforms, such as AWS Lambda, focus on executing functions in response to specific events or triggers, without requiring the developer to manage the server infrastructure.
Data Storage and Processing: Kafka provides fault-tolerant and durable storage of data, allowing consumers to rewind and replay data streams. It also supports various data processing techniques like batch processing, real-time processing, and complex event processing. Serverless platforms primarily focus on executing functions and do not provide built-in storage capabilities. However, they can integrate with external storage systems such as databases and object storage.
Infrastructure Management: Kafka requires setting up and managing a cluster of servers, including the configuration, monitoring, and maintenance of the infrastructure. In contrast, Serverless platforms handle the infrastructure management, allowing developers to focus solely on writing and deploying code. This eliminates the need to provision and manage servers, making it easier to develop and deploy applications.
Cost Model: Kafka's cost depends on the infrastructure required to set up and maintain the Kafka cluster. It involves costs associated with hardware, networking, and maintenance. Serverless platforms follow a pay-as-you-go model, where costs are incurred only when functions are invoked. Developers only pay for the actual execution time and resources used, making it a more cost-effective option for applications with sporadic or unpredictable workloads.
Latency and Real-time processing: Kafka provides low-latency processing, making it suitable for real-time data streaming and analytics. With Kafka, applications can process and react to events as they happen. Serverless platforms introduce some latency due to the underlying infrastructure required to execute functions. While the latency might be negligible for most use cases, it can impact applications that require immediate processing and response.

Kafka vs Serverless

Overview

Kafka vs Serverless: What are the differences?