Airflow logo

Airflow

A platform to programmaticaly author, schedule and monitor data pipelines, by Airbnb
384
352
+ 1
20

What is Airflow?

Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed.
Airflow is a tool in the Workflow Manager category of a tech stack.
Airflow is an open source tool with 15.3K GitHub stars and 5.8K GitHub forks. Here’s a link to Airflow's open source repository on GitHub

Who uses Airflow?

Companies
127 companies reportedly use Airflow in their tech stacks, including Airbnb, Slack, and 9GAG.

Developers
244 developers on StackShare have stated that they use Airflow.

Airflow Integrations

Why developers like Airflow?

Here’s a list of reasons why companies and developers use Airflow
Airflow Reviews

Here are some stack decisions, common use cases and reviews by companies and developers who chose Airflow in their tech stack.

StackShare Editors
StackShare Editors
Grafana
Grafana
StatsD
StatsD
Airflow
Airflow
PagerDuty
PagerDuty
Datadog
Datadog
Celery
Celery
AWS EC2
AWS EC2
Flask
Flask

Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.

Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”

There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.

Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.

Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.

Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.

See more
StackShare Editors
StackShare Editors
Prometheus
Prometheus
Chef
Chef
Consul
Consul
Memcached
Memcached
Hack
Hack
Swift
Swift
Hadoop
Hadoop
Terraform
Terraform
Airflow
Airflow
Apache Spark
Apache Spark
Kubernetes
Kubernetes
gRPC
gRPC
HHVM (HipHop Virtual Machine)
HHVM (HipHop Virtual Machine)
Presto
Presto
Kotlin
Kotlin
Apache Thrift
Apache Thrift

Since the beginning, Cal Henderson has been the CTO of Slack. Earlier this year, he commented on a Quora question summarizing their current stack.

Apps
  • Web: a mix of JavaScript/ES6 and React.
  • Desktop: And Electron to ship it as a desktop application.
  • Android: a mix of Java and Kotlin.
  • iOS: written in a mix of Objective C and Swift.
Backend
  • The core application and the API written in PHP/Hack that runs on HHVM.
  • The data is stored in MySQL using Vitess.
  • Caching is done using Memcached and MCRouter.
  • The search service takes help from SolrCloud, with various Java services.
  • The messaging system uses WebSockets with many services in Java and Go.
  • Load balancing is done using HAproxy with Consul for configuration.
  • Most services talk to each other over gRPC,
  • Some Thrift and JSON-over-HTTP
  • Voice and video calling service was built in Elixir.
Data warehouse
  • Built using open source tools including Presto, Spark, Airflow, Hadoop and Kafka.
Etc
See more
Airflow
Airflow

I use Airflow because it's the gold standard for scheduling batch data jobs. It comes with a bit of a learning curve given the extensive UI and working with different connectors. However, it has a lot of great retry features, and the visual DAGS help with a lot of troubleshooting.

See more

Jobs that mention Airflow as a desired skillset

See all jobs

Airflow's Features

  • Dynamic: Airflow pipelines are configuration as code (Python), allowing for dynamic pipeline generation. This allows for writting code that instantiate pipelines dynamically.
  • Extensible: Easily define your own operators, executors and extend the library so that it fits the level of abstraction that suits your environment.
  • Elegant: Airflow pipelines are lean and explicit. Parameterizing your scripts is built in the core of Airflow using powerful Jinja templating engine.
  • Scalable: Airflow has a modular architecture and uses a message queue to talk to orchestrate an arbitrary number of workers. Airflow is ready to scale to infinity.

Airflow Alternatives & Comparisons

What are some alternatives to Airflow?
Luigi
It is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache NiFi
An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
Jenkins
In a nutshell Jenkins CI is the leading open-source continuous integration server. Built with Java, it provides over 300 plugins to support building and testing virtually any project.
AWS Step Functions
AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.
Apache Beam
It implements batch and streaming data processing jobs that run on any execution engine. It executes pipelines on multiple execution environments.
See all alternatives

Airflow's Followers
352 developers follow Airflow to keep up with related blogs and decisions.
aayushi jain
Thun Rutthanawin
Arcady Petrov
Bradley Faircloth
Leon Rotim
Rob Spielman
Kejia Wu
Patriciadburns1
totalmediatroy
B MOB