Jun 23, 2020
It implements batch and streaming data processing jobs that run on any execution engine. It executes pipelines on multiple execution environments. | It is a client library for building applications and microservices, where the input and output data are stored in Kafka clusters. It combines the simplicity of writing and deploying standard Java and Scala applications on the client side with the benefits of Kafka's server-side cluster technology. |
Statistics | |
Stacks 182 | Stacks 397 |
Followers 361 | Followers 478 |
Votes 14 | Votes 0 |
Pros & Cons | |
Pros
| No community feedback yet |

Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed.

An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

It makes it easy to automate all your software workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub. Make code reviews, branch management, and issue triaging work the way you want.

Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

It is a data streaming platform based on Apache Kafka: a full-scale streaming platform, capable of not only publish-and-subscribe, but also the storage and processing of data within the stream

Developer framework to orchestrate multiple services and APIs into your software application using logic triggered by events and time. Build ETL processes, A/B testing, real-time alerts and personalized user experiences with custom logic.

It is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Build and map powerful workflows across tools to save your team time. No coding required. Create rules to define what information flows between each of your tools, in minutes.

KSQL is an open source streaming SQL engine for Apache Kafka. It provides a simple and completely interactive SQL interface for stream processing on Kafka; no need to write code in a programming language such as Java or Python. KSQL is open-source (Apache 2.0 licensed), distributed, scalable, reliable, and real-time.

na