Azure Functions vs Kafka: What are the differences?
What is Azure Functions? Listen and react to events across your stack. Azure Functions is an event driven, compute-on-demand experience that extends the existing Azure application platform with capabilities to implement code triggered by events occurring in virtually any Azure or 3rd party service as well as on-premises systems.
What is Kafka? Distributed, fault tolerant, high throughput pub-sub messaging system. Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
Azure Functions can be classified as a tool in the "Serverless / Task Processing" category, while Kafka is grouped under "Message Queue".
Some of the features offered by Azure Functions are:
- Easily schedule event-driven tasks across services
- Expose Functions as HTTP API endpoints
- Scale Functions based on customer demand
On the other hand, Kafka provides the following key features:
- Written at LinkedIn in Scala
- Used by LinkedIn to offload processing of all page and other views
- Defaults to using persistence, uses OS disk cache for hot data (has higher throughput then any of the above having persistence enabled)
"Pay only when invoked" is the primary reason why developers consider Azure Functions over the competitors, whereas "High-throughput" was stated as the key factor in picking Kafka.
Kafka is an open source tool with 12.5K GitHub stars and 6.7K GitHub forks. Here's a link to Kafka's open source repository on GitHub.
Slack, Shopify, and SendGrid are some of the popular companies that use Kafka, whereas Azure Functions is used by Property With Potential, OneWire, and Veris. Kafka has a broader approval, being mentioned in 501 company stacks & 451 developers stacks; compared to Azure Functions, which is listed in 27 company stacks and 21 developer stacks.
What is Azure Functions?
What is Kafka?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Azure Functions?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
CodeFactor being a #SAAS product, our goal was to run on a cloud-native infrastructure since day one. We wanted to stay product focused, rather than having to work on the infrastructure that supports the application. We needed a cloud-hosting provider that would be reliable, economical and most efficient for our product.
CodeFactor.io aims to provide an automated and frictionless code review service for software developers. That requires agility, instant provisioning, autoscaling, security, availability and compliance management features. We looked at the top three #IAAS providers that take up the majority of market share: Amazon's Amazon EC2 , Microsoft's Microsoft Azure, and Google Compute Engine.
AWS has been available since 2006 and has developed the most extensive services ant tools variety at a massive scale. Azure and GCP are about half the AWS age, but also satisfied our technical requirements.
It is worth noting that even though all three providers support Docker containerization services, GCP has the most robust offering due to their investments in Kubernetes. Also, if you are a Microsoft shop, and develop in .NET - Visual Studio Azure shines at integration there and all your existing .NET code works seamlessly on Azure. All three providers have serverless computing offerings (AWS Lambda, Azure Functions, and Google Cloud Functions). Additionally, all three providers have machine learning tools, but GCP appears to be the most developer-friendly, intuitive and complete when it comes to #Machinelearning and #AI.
The prices between providers are competitive across the board. For our requirements, AWS would have been the most expensive, GCP the least expensive and Azure was in the middle. Plus, if you #Autoscale frequently with large deltas, note that Azure and GCP have per minute billing, where AWS bills you per hour. We also applied for the #Startup programs with all three providers, and this is where Azure shined. While AWS and GCP for startups would have covered us for about one year of infrastructure costs, Azure Sponsorship would cover about two years of CodeFactor's hosting costs. Moreover, Azure Team was terrific - I felt that they wanted to work with us where for AWS and GCP we were just another startup.
In summary, we were leaning towards GCP. GCP's advantages in containerization, automation toolset, #Devops mindset, and pricing were the driving factors there. Nevertheless, we could not say no to Azure's financial incentives and a strong sense of partnership and support throughout the process.
Bottom line is, IAAS offerings with AWS, Azure, and GCP are evolving fast. At CodeFactor, we aim to be platform agnostic where it is practical and retain the flexibility to cherry-pick the best products across providers.
Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :
Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:
(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )
In a couple of recent projects we had an opportunity to try out the new Serverless approach to building web applications. It wasn't necessarily a question if we should use any particular vendor but rather "if" we can consider serverless a viable option for building apps. Obviously our goal was also to get a feel for this technology and gain some hands-on experience.
We did consider AWS Lambda, Firebase from Google as well as Azure Functions. Eventually we went with AWS Lambdas.PROS
- No servers to manage (obviously!)
- Limited fixed costs – you pay only for used time
- Automated scaling and balancing
- Automatic failover (or, at this level of abstraction, no failover problem at all)
- Security easier to provide and audit
- Low overhead at the start (with the certain level of knowledge)
- Short time to market
- Easy handover - deployment coupled with code
- Perfect choice for lean startups with fast-paced iterations
- Augmentation for the classic cloud, server(full) approach
- Not much know-how and best practices available about structuring the code and projects on the market
- Not suitable for complex business logic due to the risk of producing highly coupled code
- Cost difficult to estimate (helpful tools: serverlesscalc.com)
- Difficulty in migration to other platforms (Vendor lock⚠️)
- Little engineers with experience in serverless on the job market
- Steep learning curve for engineers without any cloud experience
More details are on our blog: https://evojam.com/blog/2018/12/5/should-you-go-serverless-meet-the-benefits-and-flaws-of-new-wave-of-cloud-solutions I hope it helps 🙌 & I'm curious of your experiences.
I use Kafka because it has almost infinite scaleability in terms of processing events (could be scaled to process hundreds of thousands of events), great monitoring (all sorts of metrics are exposed via JMX).
Downsides of using Kafka are: - you have to deal with Zookeeper - you have to implement advanced routing yourself (compared to RabbitMQ it has no advanced routing)
The question for which Message Queue to use mentioned "availability, distributed, scalability, and monitoring". I don't think that this excludes many options already. I does not sound like you would take advantage of Kafka's strengths (replayability, based on an even sourcing architecture). You could pick one of the AMQP options.
I would recommend the RabbitMQ message broker, which not only implements the AMQP standard 0.9.1 (it can support 1.x or other protocols as well) but has also several very useful extensions built in. It ticks the boxes you mentioned and on top you will get a very flexible system, that allows you to build the architecture, pick the options and trade-offs that suite your case best.
For more information about RabbitMQ, please have a look at the linked markdown I assembled. The second half explains many configuration options. It also contains links to managed hosting and to libraries (though it is missing Python's - which should be Puka, I assume).
I used Kafka originally because it was mandated as part of the top-level IT requirements at a Fortune 500 client. What I found was that it was orders of magnitude more complex ...and powerful than my daily Beanstalkd , and far more flexible, resilient, and manageable than RabbitMQ.
So for any case where utmost flexibility and resilience are part of the deal, I would use Kafka again. But due to the complexities involved, for any time where this level of scalability is not required, I would probably just use Beanstalkd for its simplicity.
I tend to find RabbitMQ to be in an uncomfortable middle place between these two extremities.
AWS Lambda Serverless Amazon CloudWatch Azure Functions Google Cloud Functions Node.js
In the last year or so, I moved all Checkly monitoring workloads to AWS Lambda. Here are some stats:
- We run three core functions in all AWS regions. They handle API checks, browser checks and setup / teardown scripts. Check our docs to find out what that means.
- All functions are hooked up to SNS topics but can also be triggered directly through AWS SDK calls.
- The busiest function is a plumbing function that forwards data to our database. It is invoked anywhere between 7000 and 10.000 times per hour with an average duration of about 179 ms.
- We run separate dev and test versions of each function in each region.
Moving all this to AWS Lambda took some work and considerations. The blog post linked below goes into the following topics:
- Why Lambda is an almost perfect match for SaaS. Especially when you're small.
- Why I don't use a "big" framework around it.
- Why distributed background jobs triggered by queues are Lambda's raison d'être.
- Why monitoring & logging is still an issue.
Poor developer experience
Front-end messages are logged to Kafka by our API and application servers. We have batch processing (on the middle-left) and real-time processing (on the middle-right) pipelines to process the experiment data. For batch processing, after daily raw log get to s3, we start our nightly experiment workflow to figure out experiment users groups and experiment metrics. We use our in-house workflow management system Pinball to manage the dependencies of all these MapReduce jobs.
Building out real-time streaming server to present data insights to Coolfront Mobile customers and internal sales and marketing teams.
I used Azure functions as part of an integration service when creating a bulk insert module in azure.