What is Celery?
Who uses Celery?
Why developers like Celery?
Here are some stack decisions, common use cases and reviews by companies and developers who chose Celery in their tech stack.
Sentry started as (and remains) an open-source project, growing out of an error logging tool built in 2008. That original build nine years ago was Django and Celery (Python’s asynchronous task codebase), with PostgreSQL as the database and Redis as the power behind Celery.
We displayed a truly shrewd notion of branding even then, giving the project a catchy name that companies the world over remain jealous of to this day: django-db-log. For the longest time, Sentry’s subtitle on GitHub was “A simple Django app, built with love.” A slightly more accurate description probably would have included Starcraft and Soylent alongside love; regardless, this captured what Sentry was all about.
As Sentry runs throughout the day, there are about 50 different offline tasks that we execute—anything from “process this event, pretty please” to “send all of these cool people some emails.” There are some that we execute once a day and some that execute thousands per second.
Managing this variety requires a reliably high-throughput message-passing technology. We use Celery's RabbitMQ implementation, and we stumbled upon a great feature called Federation that allows us to partition our task queue across any number of RabbitMQ servers and gives us the confidence that, if any single server gets backlogged, others will pitch in and distribute some of the backlogged tasks to their consumers.
A major aspect of Codecov is the use of long running asynchronous tasks to process large amounts of test coverage data uploaded by our users. Being a Python stack, Celery felt like a natural fit to manage codecov's long running tasks. We rely on Celery to manage all our background queues and asyncronous scheduling. Celery enables us to set timeouts for different tasks which has been instrumental in maintaining our queue in production. Celery also interfaces easily with Redis as a backend store, which allowed it to slot neatly into our existing infrastructure.
Data science and engineering teams at Lyft maintain several big data pipelines that serve as the foundation for various types of analysis throughout the business.
Apache Airflow sits at the center of this big data infrastructure, allowing users to “programmatically author, schedule, and monitor data pipelines.” Airflow is an open source tool, and “Lyft is the very first Airflow adopter in production since the project was open sourced around three years ago.”
There are several key components of the architecture. A web UI allows users to view the status of their queries, along with an audit trail of any modifications the query. A metadata database stores things like job status and task instance status. A multi-process scheduler handles job requests, and triggers the executor to execute those tasks.
Airflow supports several executors, though Lyft uses CeleryExecutor to scale task execution in production. Airflow is deployed to three Amazon Auto Scaling Groups, with each associated with a celery queue.
Audit logs supplied to the web UI are powered by the existing Airflow audit logs as well as Flask signal.
Datadog, Statsd, Grafana, and PagerDuty are all used to monitor the Airflow system.
For orchestrating the creation of the correct number of instances, managing errors and retries, and finally managing the deallocation of resources we use RabbitMQ in conjunction with the Celery Project framework, along with a self-developed workflow engine. Celery
백엔드, 어플리케이션 서버 모두 시간이 걸리는 작업들 및 주기적으로 수행해야 하는 작업들은 celery/celerybeat 을 통해 동작됩니다. 신뢰도가 아주 높으며, 유연합니다. gevent 로 동작시키면 더욱 깔끔한 관리가 가능합니다. Celery