AWS Lambda vs Kafka

Overview

Kafka

Stacks24.2K

Followers22.3K

Votes607

GitHub Stars31.2K

Forks14.8K

AWS Lambda

Stacks26.0K

Followers18.8K

Votes432

AWS Lambda vs Kafka: What are the differences?

Introduction

In this article, we will discuss the key differences between AWS Lambda and Kafka, two popular technologies used for building serverless and real-time data processing architectures.

Scalability: AWS Lambda is a serverless compute service that automatically scales your applications in response to incoming requests. It can handle a high number of concurrent executions, allowing you to scale rapidly based on demand. On the other hand, Kafka is a distributed streaming platform that is designed to handle high throughput and provides excellent horizontal scalability. It can handle millions of events per second, making it suitable for real-time data processing at scale.
Event-driven architecture: AWS Lambda follows an event-driven architecture, where functions are triggered by events such as changes in data or incoming API requests. It allows you to write small, individual functions that perform specific tasks, providing a fine-grained control over your application logic. Kafka, on the other hand, is a messaging system that allows decoupled communication between different components of a system. It enables multiple consumers to process the same event independently, enabling event-driven architecture at a larger scale.
Data persistence: AWS Lambda functions are stateless by default, meaning they do not retain any state between function invocations. If you need to persist data between invocations, you will need to use external storage services like databases or object stores. Kafka, on the other hand, provides built-in support for durable and fault-tolerant data persistence. It stores data events in a distributed and replicated manner, ensuring data durability and availability even in the case of failures.
Processing latency: When it comes to processing latency, AWS Lambda offers near-real-time processing with low latency. However, the actual latency can vary depending on factors such as the size of the function code, cold start delays, and network overhead. Kafka, on the other hand, provides sub-millisecond latency, making it suitable for applications that require real-time data processing with minimal delays.
Integration with other services: AWS Lambda seamlessly integrates with various other AWS services such as API Gateway, S3, DynamoDB, and more. It provides easy-to-use triggers and event sources for these services, allowing you to build complex serverless architectures. Kafka, on the other hand, provides a wide range of client libraries and connectors that enable easy integration with other systems and frameworks, making it a versatile choice for building data processing pipelines.
Complexity and learning curve: AWS Lambda is a managed service provided by AWS, which means the infrastructure management and scaling are taken care of by AWS. This simplifies the development process and reduces the operational overhead. Kafka, on the other hand, requires setting up and managing a Kafka cluster, which involves configuring brokers, topics, partitions, and more. This can be more complex and requires a certain level of understanding of distributed systems.

In summary, AWS Lambda is a serverless compute service that provides automatic scaling and event-driven architecture, while Kafka is a distributed streaming platform designed for high throughput and real-time data processing. Lambda is suitable for small-scale applications with low complexity and latency requirements, while Kafka is more suitable for large-scale, high-throughput data processing with complex event-driven architectures.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Kafka, AWS Lambda

Tim

CTO at Checkly Inc.

Sep 18, 2019

Needs adviceon

Heroku

AWS Lambda

When adding a new feature to Checkly rearchitecting some older piece, I tend to pick Heroku for rolling it out. But not always, because sometimes I pick AWS Lambda . The short story:

Developer Experience trumps everything.
AWS Lambda is cheap. Up to a limit though. This impact not only your wallet.
If you need geographic spread, AWS is lonely at the top.

The setup

Recently, I was doing a brainstorm at a startup here in Berlin on the future of their infrastructure. They were ready to move on from their initial, almost 100% Ec2 + Chef based setup. Everything was on the table. But we crossed out a lot quite quickly:

Pure, uncut, self hosted Kubernetes — way too much complexity
Managed Kubernetes in various flavors — still too much complexity
Zeit — Maybe, but no Docker support
Elastic Beanstalk — Maybe, bit old but does the job
Heroku
Lambda

It became clear a mix of PaaS and FaaS was the way to go. What a surprise! That is exactly what I use for Checkly! But when do you pick which model?

I chopped that question up into the following categories:

Developer Experience / DX 🤓
Ops Experience / OX 🐂 (?)
Cost 💵
Lock in 🔐

Read the full post linked below for all details

357k views357k

Comments

viradiya

Apr 12, 2020

Needs adviceon

AngularJS

ASP.NET Core

MSSQL

We are going to develop a microservices-based application. It consists of AngularJS, ASP.NET Core, and MSSQL.

We have 3 types of microservices. Emailservice, Filemanagementservice, Filevalidationservice

I am a beginner in microservices. But I have read about RabbitMQ, but come to know that there are Redis and Kafka also in the market. So, I want to know which is best.

934k views934k

Comments

Mark

Nov 2, 2020

Needs adviceon

Microsoft Azure

Need advice on what platform, systems and tools to use.

Evaluating whether to start a new digital business for which we will need to build a website that handles all traffic. Website only right now. May add smartphone apps later. No desktop app will ever be added. Website to serve various countries and languages. B2B and B2C type customers. Need to handle heavy traffic, be low cost, and scale well.

We are open to either build it on AWS or on Microsoft Azure.

Apologies if I'm leaving out some info. My first post. :) Thanks in advance!

133k views133k

Comments

Detailed Comparison

Kafka	AWS Lambda
Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.	AWS Lambda is a compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and security.
Written at LinkedIn in Scala;Used by LinkedIn to offload processing of all page and other views;Defaults to using persistence, uses OS disk cache for hot data (has higher throughput then any of the above having persistence enabled);Supports both on-line as off-line processing	Extend other AWS services with custom logic;Build custom back-end services;Completely Automated Administration;Built-in Fault Tolerance;Automatic Scaling;Integrated Security Model;Bring Your Own Code;Pay Per Use;Flexible Resource Model
Statistics
GitHub Stars 31.2K	GitHub Stars -
GitHub Forks 14.8K	GitHub Forks -
Stacks 24.2K	Stacks 26.0K
Followers 22.3K	Followers 18.8K
Votes 607	Votes 432
Pros & Cons
Pros 126 High-throughput 119 Distributed 92 Scalable 86 High-Performance 66 Durable Cons 32 Non-Java clients are second-class citizens 29 Needs Zookeeper 9 Operational difficulties 5 Terrible Packaging	Pros 129 No infrastructure 83 Cheap 70 Quick 59 Stateless 47 No deploy, no server, great sleep Cons 7 Cant execute ruby or go 3 Compute time limited 1 Can't execute PHP w/o significant effort

What are some alternatives to Kafka, AWS Lambda?

RabbitMQ

RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.

Celery

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

Amazon SQS

Transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available. With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use.

NSQ

NSQ is a realtime distributed messaging platform designed to operate at scale, handling billions of messages per day. It promotes distributed and decentralized topologies without single points of failure, enabling fault tolerance and high availability coupled with a reliable message delivery guarantee. See features & guarantees.

ActiveMQ

Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License.

ZeroMQ

The 0MQ lightweight messaging kernel is a library which extends the standard socket interfaces with features traditionally provided by specialised messaging middleware products. 0MQ sockets provide an abstraction of asynchronous message queues, multiple messaging patterns, message filtering (subscriptions), seamless access to multiple transport protocols and more.

Apache NiFi

An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Azure Functions

Azure Functions is an event driven, compute-on-demand experience that extends the existing Azure application platform with capabilities to implement code triggered by events occurring in virtually any Azure or 3rd party service as well as on-premises systems.

Google Cloud Run

A managed compute platform that enables you to run stateless containers that are invocable via HTTP requests. It's serverless by abstracting away all infrastructure management.

Gearman

Gearman allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events.

Related Comparisons

AWS Lambda vs Kafka: What are the differences?

Introduction

In this article, we will discuss the key differences between AWS Lambda and Kafka, two popular technologies used for building serverless and real-time data processing architectures.

Scalability: AWS Lambda is a serverless compute service that automatically scales your applications in response to incoming requests. It can handle a high number of concurrent executions, allowing you to scale rapidly based on demand. On the other hand, Kafka is a distributed streaming platform that is designed to handle high throughput and provides excellent horizontal scalability. It can handle millions of events per second, making it suitable for real-time data processing at scale.
Event-driven architecture: AWS Lambda follows an event-driven architecture, where functions are triggered by events such as changes in data or incoming API requests. It allows you to write small, individual functions that perform specific tasks, providing a fine-grained control over your application logic. Kafka, on the other hand, is a messaging system that allows decoupled communication between different components of a system. It enables multiple consumers to process the same event independently, enabling event-driven architecture at a larger scale.
Data persistence: AWS Lambda functions are stateless by default, meaning they do not retain any state between function invocations. If you need to persist data between invocations, you will need to use external storage services like databases or object stores. Kafka, on the other hand, provides built-in support for durable and fault-tolerant data persistence. It stores data events in a distributed and replicated manner, ensuring data durability and availability even in the case of failures.
Processing latency: When it comes to processing latency, AWS Lambda offers near-real-time processing with low latency. However, the actual latency can vary depending on factors such as the size of the function code, cold start delays, and network overhead. Kafka, on the other hand, provides sub-millisecond latency, making it suitable for applications that require real-time data processing with minimal delays.
Integration with other services: AWS Lambda seamlessly integrates with various other AWS services such as API Gateway, S3, DynamoDB, and more. It provides easy-to-use triggers and event sources for these services, allowing you to build complex serverless architectures. Kafka, on the other hand, provides a wide range of client libraries and connectors that enable easy integration with other systems and frameworks, making it a versatile choice for building data processing pipelines.
Complexity and learning curve: AWS Lambda is a managed service provided by AWS, which means the infrastructure management and scaling are taken care of by AWS. This simplifies the development process and reduces the operational overhead. Kafka, on the other hand, requires setting up and managing a Kafka cluster, which involves configuring brokers, topics, partitions, and more. This can be more complex and requires a certain level of understanding of distributed systems.

AWS Lambda vs Kafka

Overview

AWS Lambda vs Kafka: What are the differences?