Kafka vs Serverless: What are the differences?
What is Kafka? Distributed, fault tolerant, high throughput pub-sub messaging system. Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
What is Serverless? The most widely-adopted toolkit for building serverless applications. Build applications comprised of microservices that run in response to events, auto-scale for you, and only charge you when they run. This lowers the total cost of maintaining your apps, enabling you to build more logic, faster. The Framework uses new event-driven compute services, like AWS Lambda, Google CloudFunctions, and more.
Kafka belongs to "Message Queue" category of the tech stack, while Serverless can be primarily classified under "Serverless / Task Processing".
"High-throughput" is the top reason why over 95 developers like Kafka, while over 10 developers mention "API integration " as the leading cause for choosing Serverless.
Kafka and Serverless are both open source tools. It seems that Serverless with 30.5K GitHub stars and 3.38K forks on GitHub has more adoption than Kafka with 12.5K GitHub stars and 6.7K GitHub forks.
Slack, Shopify, and SendGrid are some of the popular companies that use Kafka, whereas Serverless is used by Droplr, Plista GmbH, and Hammerhead. Kafka has a broader approval, being mentioned in 501 company stacks & 451 developers stacks; compared to Serverless, which is listed in 112 company stacks and 43 developer stacks.
What is Kafka?
What is Serverless?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Serverless?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
At Epsagon, we use hundreds of AWS Lambda functions, most of them are written in Python, and the Serverless Framework to pack and deploy them. One of the issues we've encountered is the difficulty to package external libraries into the Lambda environment using the Serverless Framework. This limitation is probably by design since the external code your Lambda needs can be usually included with a package manager.
In order to overcome this issue, we've developed a tool, which we also published as open-source (see link below), which automatically packs these libraries using a simple npm package and a YAML configuration file. Support for Node.js, Go, and Java will be available soon.
The GitHub respoitory: https://github.com/epsagon/serverless-package-external
Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :
Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:
(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )
In a couple of recent projects we had an opportunity to try out the new Serverless approach to building web applications. It wasn't necessarily a question if we should use any particular vendor but rather "if" we can consider serverless a viable option for building apps. Obviously our goal was also to get a feel for this technology and gain some hands-on experience.
We did consider AWS Lambda, Firebase from Google as well as Azure Functions. Eventually we went with AWS Lambdas.PROS
- No servers to manage (obviously!)
- Limited fixed costs – you pay only for used time
- Automated scaling and balancing
- Automatic failover (or, at this level of abstraction, no failover problem at all)
- Security easier to provide and audit
- Low overhead at the start (with the certain level of knowledge)
- Short time to market
- Easy handover - deployment coupled with code
- Perfect choice for lean startups with fast-paced iterations
- Augmentation for the classic cloud, server(full) approach
- Not much know-how and best practices available about structuring the code and projects on the market
- Not suitable for complex business logic due to the risk of producing highly coupled code
- Cost difficult to estimate (helpful tools: serverlesscalc.com)
- Difficulty in migration to other platforms (Vendor lock⚠️)
- Little engineers with experience in serverless on the job market
- Steep learning curve for engineers without any cloud experience
More details are on our blog: https://evojam.com/blog/2018/12/5/should-you-go-serverless-meet-the-benefits-and-flaws-of-new-wave-of-cloud-solutions I hope it helps 🙌 & I'm curious of your experiences.
I use Kafka because it has almost infinite scaleability in terms of processing events (could be scaled to process hundreds of thousands of events), great monitoring (all sorts of metrics are exposed via JMX).
Downsides of using Kafka are: - you have to deal with Zookeeper - you have to implement advanced routing yourself (compared to RabbitMQ it has no advanced routing)
The question for which Message Queue to use mentioned "availability, distributed, scalability, and monitoring". I don't think that this excludes many options already. I does not sound like you would take advantage of Kafka's strengths (replayability, based on an even sourcing architecture). You could pick one of the AMQP options.
I would recommend the RabbitMQ message broker, which not only implements the AMQP standard 0.9.1 (it can support 1.x or other protocols as well) but has also several very useful extensions built in. It ticks the boxes you mentioned and on top you will get a very flexible system, that allows you to build the architecture, pick the options and trade-offs that suite your case best.
For more information about RabbitMQ, please have a look at the linked markdown I assembled. The second half explains many configuration options. It also contains links to managed hosting and to libraries (though it is missing Python's - which should be Puka, I assume).
Which #IaaS / #PaaS to chose? Not all #Cloud providers are created equal. As you start to use one or the other, you'll build around very specific services that don't have their equivalent elsewhere.
Back in 2014/2015, this decision I made for SmartZip was a no-brainer and #AWS won. AWS has been a leader, and over the years demonstrated their capacity to innovate, and reducing toil. Like no other.
Year after year, this kept on being confirmed, as they rolled out new (managed) services, got into Serverless with AWS Lambda / FaaS And allowed domains such as #AI / #MachineLearning to be put into the hands of every developers thanks to Amazon Machine Learning or Amazon SageMaker for instance.
Should you compare with #GCP for instance, it's not quite there yet. Building around these managed services, #AWS allowed me to get my developers on a whole new level. Where they know what's under the hood. Where they know they have these services available and can build around them. Where they care and are responsible for operations and security and deployment of what they've worked on.
I used Kafka originally because it was mandated as part of the top-level IT requirements at a Fortune 500 client. What I found was that it was orders of magnitude more complex ...and powerful than my daily Beanstalkd , and far more flexible, resilient, and manageable than RabbitMQ.
So for any case where utmost flexibility and resilience are part of the deal, I would use Kafka again. But due to the complexities involved, for any time where this level of scalability is not required, I would probably just use Beanstalkd for its simplicity.
I tend to find RabbitMQ to be in an uncomfortable middle place between these two extremities.
Our backend is serverless based, with many AWS Lambda , with CI/CD, using CircleCI and Serverless. This allows to develop with awesome agility and move fast. Since we update our lambdas daily, we needed a way to make sure we did not run into AWS's max limit of versions per lambda. We use the open source in link below to clear them out and stay clear of the limit.
In #Aliadoc, we're exploring the crowdfunding option to get traction before launch. We are building a SaaS platform for website design customization.
For the Admin UI and website editor we use React and we're currently transitioning from a Create React App setup to a custom one because our needs have become more specific. We use CloudFlare as much as possible, it's a great service.
For routing dynamic resources and proxy tasks to feed websites to the editor we leverage CloudFlare Workers for improved responsiveness. We use Firebase for our hosting needs and user authentication while also using several Cloud Functions for Firebase to interact with other services along with Google App Engine and Google Cloud Storage, but also the Real Time Database is on the radar for collaborative website editing.
We generally hate configuration but honestly because of the stage of our project we lack resources for doing heavy sysops work. So we are basically just relying on Serverless technologies as much as we can to do all server side processing.
Visual Studio Code definitively makes programming a much easier and enjoyable task, we just love it. We combine it with Bitbucket for our source code control needs.
AWS Lambda Serverless Amazon CloudWatch Azure Functions Google Cloud Functions Node.js
In the last year or so, I moved all Checkly monitoring workloads to AWS Lambda. Here are some stats:
- We run three core functions in all AWS regions. They handle API checks, browser checks and setup / teardown scripts. Check our docs to find out what that means.
- All functions are hooked up to SNS topics but can also be triggered directly through AWS SDK calls.
- The busiest function is a plumbing function that forwards data to our database. It is invoked anywhere between 7000 and 10.000 times per hour with an average duration of about 179 ms.
- We run separate dev and test versions of each function in each region.
Moving all this to AWS Lambda took some work and considerations. The blog post linked below goes into the following topics:
- Why Lambda is an almost perfect match for SaaS. Especially when you're small.
- Why I don't use a "big" framework around it.
- Why distributed background jobs triggered by queues are Lambda's raison d'être.
- Why monitoring & logging is still an issue.
We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.
To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas
To build #Webapps we decided to use Angular 2 with RxJS
#Devops - GitHub , Travis CI , Terraform , Docker , Serverless
Front-end messages are logged to Kafka by our API and application servers. We have batch processing (on the middle-left) and real-time processing (on the middle-right) pipelines to process the experiment data. For batch processing, after daily raw log get to s3, we start our nightly experiment workflow to figure out experiment users groups and experiment metrics. We use our in-house workflow management system Pinball to manage the dependencies of all these MapReduce jobs.
Building out real-time streaming server to present data insights to Coolfront Mobile customers and internal sales and marketing teams.