Needs advice

We're looking to do a project for a company that has incoming data from 2 sources, namely MongoDB and MySQL. We need to make it such that we are combining data from these 2 sources and showing it in real-time to PostgreSQL. Ideally, about 600,000 records per day. Which tool would be better for this use case? Airflow or Kafka?

5 upvotes·12.3K views
Replies (2)

For getting the data out of MySQL and MongoDB, I would recommend using the Flink CDC connector ( and With Apache Flink's JDBC Sink connector, you'll be able to send the data to PostgreSQL:

You'll be able to run all this in a single Flink job (e.g. getting the data from 2 sources and sending it to one destination), no Kafka or Airflow required.

If you are looking for a cloud SaaS solution, you could also check out the offering of my company: In decodable you can define all these sources and sinks using a GUI, and our platform takes care of running it for you. You can even try it for free.

4 upvotes·7.1K views

According to the Airflow documentation "Airflow is not a data streaming solution. Tasks do not move data from one to the other (though tasks can exchange metadata!).", so you won't be able to provide real-time data. You can use Kafka combined with some Spark scripts. Another way that you can think of is using a change data capture (CDC) tool as Airbyte or Debezium combined to Kafka.

4 upvotes·7.1K views
Avatar of Ankit Kumar