Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Airflow

1.7K
2.7K
+ 1
128
Amazon SQS

2.3K
2K
+ 1
171
Add tool

Airflow vs Amazon SQS: What are the differences?

Introduction

In this analysis, we will explore the key differences between Airflow and Amazon SQS. Both Airflow and Amazon SQS are workflow management systems that are commonly used in the industry. While they have some similarities, there are distinct differences that set them apart.

  1. Scalability: One major difference between Airflow and Amazon SQS is their scalability. Airflow is designed to handle large-scale workflows with ease, allowing for the execution of complex workflows across multiple machines. On the other hand, Amazon SQS provides a highly scalable message queuing service, primarily used for decoupling the sending and receiving of messages in a distributed system architecture.

  2. Functionality: Airflow offers a rich set of functionalities and features that allow users to design and manage workflows effectively. It provides a readily available user interface for scheduling, monitoring, and managing workflows, along with a wide range of pre-built operators and connectors. In contrast, Amazon SQS focuses solely on message queuing functionality, providing a simple and reliable messaging service with no additional workflow management features.

  3. Deployment: Airflow can be self-hosted on-premises or deployed on cloud infrastructure, providing flexibility in terms of deployment options. It can be installed and managed on any infrastructure, allowing for customization and control over the environment. On the other hand, Amazon SQS is a fully managed service provided by Amazon Web Services (AWS), meaning that it is hosted and maintained by AWS, relieving users of the burden of deployment and management.

  4. Integration: Airflow provides seamless integration with various databases, message brokers, and third-party services, offering a wide range of connectors and operators out of the box. This makes it easy to incorporate Airflow into existing technologies and systems. In contrast, Amazon SQS integrates seamlessly with other AWS services, allowing for efficient and reliable communication within the AWS ecosystem.

  5. Message Reliability: Another difference between Airflow and Amazon SQS lies in their message reliability mechanisms. Airflow ensures that tasks within a workflow are executed exactly once, guaranteeing end-to-end reliability. On the other hand, Amazon SQS provides a highly reliable messaging service by replicating messages across multiple servers within a region, ensuring that messages are not lost.

  6. Cost Structure: Airflow is an open-source platform and can be used free of charge. However, the cost of running Airflow includes infrastructure costs for hosting the platform. On the other hand, Amazon SQS follows a pay-as-you-go pricing model, where users are billed based on the number of requests and data transfer in and out of the service. The cost structure of Amazon SQS is directly tied to the usage of the service.

In Summary, Airflow and Amazon SQS differ in scalability, functionality, deployment options, integration capabilities, message reliability mechanisms, and cost structure.

Advice on Airflow and Amazon SQS
Pulkit Sapra
Needs advice
on
Amazon SQSAmazon SQSKubernetesKubernetes
and
RabbitMQRabbitMQ

Hi! I am creating a scraping system in Django, which involves long running tasks between 1 minute & 1 Day. As I am new to Message Brokers and Task Queues, I need advice on which architecture to use for my system. ( Amazon SQS, RabbitMQ, or Celery). The system should be autoscalable using Kubernetes(K8) based on the number of pending tasks in the queue.

See more
Replies (1)
Anis Zehani
Recommends
on
KafkaKafka

Hello, i highly recommend Apache Kafka, to me it's the best. You can deploy it in cluster mode inside K8S, thus you can have a Highly available system (also auto scalable).

Good luck

See more
Meili Triantafyllidi
Software engineer at Digital Science · | 6 upvotes · 488.9K views
Needs advice
on
Amazon SQSAmazon SQSRabbitMQRabbitMQ
and
ZeroMQZeroMQ

Hi, we are in a ZMQ set up in a push/pull pattern, and we currently start to have more traffic and cases that the service is unavailable or stuck. We want to: * Not loose messages in services outages * Safely restart service without losing messages (ZeroMQ seems to need to close the socket in the receiver before restart manually)

Do you have experience with this setup with ZeroMQ? Would you suggest RabbitMQ or Amazon SQS (we are in AWS setup) instead? Something else?

Thank you for your time

See more
Replies (2)
Shishir Pandey
Recommends
on
RabbitMQRabbitMQ

ZeroMQ is fast but you need to build build reliability yourself. There are a number of patterns described in the zeromq guide. I have used RabbitMQ before which gives lot of functionality out of the box, you can probably use the worker queues example from the tutorial, it can also persists messages in the queue.

I haven't used Amazon SQS before. Another tool you could use is Kafka.

See more
Kevin Deyne
Principal Software Engineer at Accurate Background · | 5 upvotes · 224.5K views
Recommends
on
RabbitMQRabbitMQ

Both would do the trick, but there are some nuances. We work with both.

From the sound of it, your main focus is "not losing messages". In that case, I would go with RabbitMQ with a high availability policy (ha-mode=all) and a main/retry/error queue pattern.

Push messages to an exchange, which sends them to the main queue. If an error occurs, push the errored out message to the retry exchange, which forwards it to the retry queue. Give the retry queue a x-message-ttl and set the main exchange as a dead-letter-exchange. If your message has been retried several times, push it to the error exchange, where the message can remain until someone has time to look at it.

This is a very useful and resilient pattern that allows you to never lose messages. With the high availability policy, you make sure that if one of your rabbitmq nodes dies, another can take over and messages are already mirrored to it.

This is not really possible with SQS, because SQS is a lot more focused on throughput and scaling. Combined with SNS it can do interesting things like deduplication of messages and such. That said, one thing core to its design is that messages have a maximum retention time. The idea is that a message that has stayed in an SQS queue for a while serves no more purpose after a while, so it gets removed - so as to not block up any listener resources for a long time. You can also set up a DLQ here, but these similarly do not hold onto messages forever. Since you seem to depend on messages surviving at all cost, I would suggest that the scaling/throughput benefit of SQS does not outweigh the difference in approach to messages there.

See more
MITHIRIDI PRASANTH
Software Engineer at LightMetrics · | 4 upvotes · 296K views
Needs advice
on
Amazon MQAmazon MQ
and
Amazon SQSAmazon SQS
in

I want to schedule a message. Amazon SQS provides a delay of 15 minutes, but I want it in some hours.

Example: Let's say a Message1 is consumed by a consumer A but somehow it failed inside the consumer. I would want to put it in a queue and retry after 4hrs. Can I do this in Amazon MQ? I have seen in some Amazon MQ videos saying scheduling messages can be done. But, I'm not sure how.

See more
Replies (1)
Andres Paredes
Lead Senior Software Engineer at InTouch Technology · | 1 upvotes · 224.9K views
Recommends
on
Amazon SQSAmazon SQS

Mithiridi, I believe you are talking about two different things. 1. If you need to process messages with delays of more 15m or at specific times, it's not a good idea to use queues, independently of tool SQM, Rabbit or Amazon MQ. you should considerer another approach using a scheduled job. 2. For dead queues and policy retries RabbitMQ, for example, doesn't support your use case. https://medium.com/@kiennguyen88/rabbitmq-delay-retry-schedule-with-dead-letter-exchange-31fb25a440fc I'm not sure if that is possible SNS/SQS support, they have a maximum delay for delivery (maxDelayTarget) in seconds but it's not clear the number. You can check this out: https://docs.aws.amazon.com/sns/latest/dg/sns-message-delivery-retries.html

See more
Needs advice
on
AirflowAirflowLuigiLuigi
and
Apache SparkApache Spark

I am so confused. I need a tool that will allow me to go to about 10 different URLs to get a list of objects. Those object lists will be hundreds or thousands in length. I then need to get detailed data lists about each object. Those detailed data lists can have hundreds of elements that could be map/reduced somehow. My batch process dies sometimes halfway through which means hours of processing gone, i.e. time wasted. I need something like a directed graph that will keep results of successful data collection and allow me either pragmatically or manually to retry the failed ones some way (0 - forever) times. I want it to then process all the ones that have succeeded or been effectively ignored and load the data store with the aggregation of some couple thousand data-points. I know hitting this many endpoints is not a good practice but I can't put collectors on all the endpoints or anything like that. It is pretty much the only way to get the data.

See more
Replies (1)
Gilroy Gordon
Solution Architect at IGonics Limited · | 2 upvotes · 286.5K views
Recommends
on
CassandraCassandra

For a non-streaming approach:

You could consider using more checkpoints throughout your spark jobs. Furthermore, you could consider separating your workload into multiple jobs with an intermittent data store (suggesting cassandra or you may choose based on your choice and availability) to store results , perform aggregations and store results of those.

Spark Job 1 - Fetch Data From 10 URLs and store data and metadata in a data store (cassandra) Spark Job 2..n - Check data store for unprocessed items and continue the aggregation

Alternatively for a streaming approach: Treating your data as stream might be useful also. Spark Streaming allows you to utilize a checkpoint interval - https://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing

See more
Manage your open source components, licenses, and vulnerabilities
Learn More
Pros of Airflow
Pros of Amazon SQS
  • 53
    Features
  • 14
    Task Dependency Management
  • 12
    Beautiful UI
  • 12
    Cluster of workers
  • 10
    Extensibility
  • 6
    Open source
  • 5
    Complex workflows
  • 5
    Python
  • 3
    Good api
  • 3
    Apache project
  • 3
    Custom operators
  • 2
    Dashboard
  • 62
    Easy to use, reliable
  • 40
    Low cost
  • 28
    Simple
  • 14
    Doesn't need to maintain it
  • 8
    It is Serverless
  • 4
    Has a max message size (currently 256K)
  • 3
    Triggers Lambda
  • 3
    Easy to configure with Terraform
  • 3
    Delayed delivery upto 15 mins only
  • 3
    Delayed delivery upto 12 hours
  • 1
    JMS compliant
  • 1
    Support for retry and dead letter queue
  • 1
    D

Sign up to add or upvote prosMake informed product decisions

Cons of Airflow
Cons of Amazon SQS
  • 2
    Observability is not great when the DAGs exceed 250
  • 2
    Running it on kubernetes cluster relatively complex
  • 2
    Open source - provides minimum or no support
  • 1
    Logical separation of DAGs is not straight forward
  • 2
    Has a max message size (currently 256K)
  • 2
    Proprietary
  • 2
    Difficult to configure
  • 1
    Has a maximum 15 minutes of delayed messages only

Sign up to add or upvote consMake informed product decisions

129
10.6K
45
4.2K

What is Airflow?

Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed.

What is Amazon SQS?

Transmit any volume of data, at any level of throughput, without losing messages or requiring other services to be always available. With SQS, you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use.

Need advice about which tool to choose?Ask the StackShare community!

Jobs that mention Airflow and Amazon SQS as a desired skillset
What companies use Airflow?
What companies use Amazon SQS?
Manage your open source components, licenses, and vulnerabilities
Learn More

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with Airflow?
What tools integrate with Amazon SQS?

Sign up to get full access to all the tool integrationsMake informed product decisions

Blog Posts

GitHubPythonNode.js+47
55
73001
GitGitHubSlack+30
27
18883
GitHubDockerAmazon EC2+23
12
6683
GitHubPythonSlack+25
7
3258
What are some alternatives to Airflow and Amazon SQS?
Luigi
It is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache NiFi
An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
Jenkins
In a nutshell Jenkins CI is the leading open-source continuous integration server. Built with Java, it provides over 300 plugins to support building and testing virtually any project.
AWS Step Functions
AWS Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.
Pachyderm
Pachyderm is an open source MapReduce engine that uses Docker containers for distributed computations.
See all alternatives