What is Kafka?
Who uses Kafka?
Here are some stack decisions, common use cases and reviews by companies and developers who chose Kafka in their tech stack.
I want to collect the dependency data that Java applications build in the maven tool by CI/CD tools. I want to know how to pick collection tech, and what is the pros and cons between Kafka an RabbitMQ.
My process is like this: I would get data once a month, either from Google BigQuery or as parquet files from Azure Blob Storage. I have a script that does some cleaning and then stores the result as partitioned parquet files because the following process cannot handle loading all data to memory.
The next process is making a heavy computation in a parallel fashion (per partition), and storing 3 intermediate versions as parquet files: two used for statistics, and the third will be filtered and create the final files.
I make a report based on the two files in Jupyter notebook and convert it to HTML.
- Everything is done with vanilla python and Pandas.
- sometimes I may get a different format of data
- cloud service is Microsoft Azure.
What I'm considering is the following:
Get the data with Kafka or with native python, do the first processing, and store data in Druid, the second processing will be done with Apache Spark getting data from apache druid.
the intermediate states can be stored in druid too. and visualization would be with apache superset.
Hi all, I'm working on a project where I have to implement Messaging queues in a project. I just need to know about your personal experience with these queues which is best (RabbitMQ or Kafka).
We're looking to do a project for a company that has incoming data from 2 sources, namely MongoDB and MySQL. We need to make it such that we are combining data from these 2 sources and showing it in real-time to PostgreSQL. Ideally, about 600,000 records per day. Which tool would be better for this use case? Airflow or Kafka?
Kindly suggest the best tool for generating 10Mn+ concurrent user load. The tool must support MQTT traffic, REST API, support to interfaces such as Kafka, websockets, persistence HTTP connection, auth type support to assess the support /coverage.
The tool can be integrated into CI pipelines like Azure Pipelines, GitHub, and Jenkins.
I have recently started using Confluent/Kafka cloud. We want to do some stream processing. As I was going through Kafka I came across Kafka Streams and KSQL. Both seem to be A good fit for stream processing. But I could not understand which one should be used and one has any advantage over another. We will be using Confluent/Kafka Managed Cloud Instance. In near future, our Producers and Consumers are running on premise and we will be interacting with Confluent Cloud.
Also, Confluent Cloud Kafka has a primitive interface; is there any better UI interface to manage Kafka Cloud Cluster?
Jobs that mention Kafka as a desired skillset
- Written at LinkedIn in Scala
- Used by LinkedIn to offload processing of all page and other views
- Defaults to using persistence, uses OS disk cache for hot data (has higher throughput then any of the above having persistence enabled)
- Supports both on-line as off-line processing