Need advice about which tool to choose?Ask the StackShare community!
Amazon SQS vs Celery: What are the differences?
Introduction:
Amazon Simple Queue Service (Amazon SQS) and Celery are both message queue services used for managing asynchronous tasks in distributed applications. However, there are key differences between these two services that differentiate them in terms of features, architecture, and ease of use.
1. Scalability: Amazon SQS is a fully managed service provided by Amazon Web Services (AWS) that offers unlimited scalability. It automatically scales to handle the load of any number of messages without the need for manual intervention. On the other hand, Celery requires manual configuration and setup for scalability, making it less suitable for high loads or sudden spikes in traffic.
2. Serverless Architecture: Amazon SQS operates on a serverless architecture where the underlying infrastructure is managed by AWS. This means that developers do not need to worry about managing servers or infrastructure. In contrast, Celery requires the deployment and management of servers, which can add complexity and overhead in terms of infrastructure management.
3. Message Retention: Amazon SQS provides extended message retention for up to 14 days, allowing the messages to be temporarily stored even if the consumer is not available. Celery, on the other hand, does not provide built-in message retention and relies on external storage systems for long-term message storage.
4. FIFO support: Amazon SQS offers First-In-First-Out (FIFO) support, which maintains the order of messages and ensures that they are processed in the order they were sent. This is particularly useful for applications that require strict message ordering. Celery, on the other hand, may not guarantee strict message ordering as it prioritizes processing based on task priority and worker availability.
5. Integration with AWS ecosystem: Amazon SQS seamlessly integrates with other AWS services, such as AWS Lambda and Amazon S3, which allows for easy building of serverless applications and event-driven architectures. Celery, being a standalone open-source project, may require additional configuration and integration efforts to work with other AWS services.
6. Cost Structure: Amazon SQS follows a pay-as-you-go pricing model, where users are charged based on the number of requests and message requests. This allows for cost optimization based on the actual usage pattern. Celery, being an open-source project, does not have any direct cost associated with it. However, users need to consider the infrastructure and operational costs for deploying and managing Celery workers.
Summary: In summary, Amazon SQS and Celery differ in terms of scalability, architecture, message retention, FIFO support, integration with AWS ecosystem, and cost structure. While Amazon SQS offers unlimited scalability, serverless architecture, extended message retention, and seamless integration with AWS services, Celery requires manual configuration for scalability, server deployment and management, external storage systems for message retention, and additional efforts for integration with AWS services. Additionally, Amazon SQS follows a cost-efficient pay-as-you-go pricing model, while Celery is free but requires consideration of infrastructure and operational costs.
Hi! I am creating a scraping system in Django, which involves long running tasks between 1 minute & 1 Day. As I am new to Message Brokers and Task Queues, I need advice on which architecture to use for my system. ( Amazon SQS, RabbitMQ, or Celery). The system should be autoscalable using Kubernetes(K8) based on the number of pending tasks in the queue.
Hello, i highly recommend Apache Kafka, to me it's the best. You can deploy it in cluster mode inside K8S, thus you can have a Highly available system (also auto scalable).
Good luck
I am just a beginner at these two technologies.
Problem statement: I am getting lakh of users from the sequel server for whom I need to create caches in MongoDB by making different REST API requests.
Here these users can be treated as messages. Each REST API request is a task.
I am confused about whether I should go for RabbitMQ alone or Celery.
If I have to go with RabbitMQ, I prefer to use python with Pika module. But the challenge with Pika is, it is not thread-safe. So I am not finding a way to execute a lakh of API requests in parallel using multiple threads using Pika.
If I have to go with Celery, I don't know how I can achieve better scalability in executing these API requests in parallel.
For large amounts of small tasks and caches I have had good luck with Redis and RQ. I have not personally used celery but I am fairly sure it would scale well, and I have not used RabbitMQ for anything besides communication between services. If you prefer python my suggestions should feel comfortable.
Sorry I do not have a more information
Hi, we are in a ZMQ set up in a push/pull pattern, and we currently start to have more traffic and cases that the service is unavailable or stuck. We want to: * Not loose messages in services outages * Safely restart service without losing messages (ZeroMQ seems to need to close the socket in the receiver before restart manually)
Do you have experience with this setup with ZeroMQ? Would you suggest RabbitMQ or Amazon SQS (we are in AWS setup) instead? Something else?
Thank you for your time
ZeroMQ is fast but you need to build build reliability yourself. There are a number of patterns described in the zeromq guide. I have used RabbitMQ before which gives lot of functionality out of the box, you can probably use the worker queues
example from the tutorial, it can also persists messages in the queue.
I haven't used Amazon SQS before. Another tool you could use is Kafka.
Both would do the trick, but there are some nuances. We work with both.
From the sound of it, your main focus is "not losing messages". In that case, I would go with RabbitMQ with a high availability policy (ha-mode=all) and a main/retry/error queue pattern.
Push messages to an exchange, which sends them to the main queue. If an error occurs, push the errored out message to the retry exchange, which forwards it to the retry queue. Give the retry queue a x-message-ttl and set the main exchange as a dead-letter-exchange. If your message has been retried several times, push it to the error exchange, where the message can remain until someone has time to look at it.
This is a very useful and resilient pattern that allows you to never lose messages. With the high availability policy, you make sure that if one of your rabbitmq nodes dies, another can take over and messages are already mirrored to it.
This is not really possible with SQS, because SQS is a lot more focused on throughput and scaling. Combined with SNS it can do interesting things like deduplication of messages and such. That said, one thing core to its design is that messages have a maximum retention time. The idea is that a message that has stayed in an SQS queue for a while serves no more purpose after a while, so it gets removed - so as to not block up any listener resources for a long time. You can also set up a DLQ here, but these similarly do not hold onto messages forever. Since you seem to depend on messages surviving at all cost, I would suggest that the scaling/throughput benefit of SQS does not outweigh the difference in approach to messages there.
I want to schedule a message. Amazon SQS provides a delay of 15 minutes, but I want it in some hours.
Example: Let's say a Message1 is consumed by a consumer A but somehow it failed inside the consumer. I would want to put it in a queue and retry after 4hrs. Can I do this in Amazon MQ? I have seen in some Amazon MQ videos saying scheduling messages can be done. But, I'm not sure how.
Mithiridi, I believe you are talking about two different things. 1. If you need to process messages with delays of more 15m or at specific times, it's not a good idea to use queues, independently of tool SQM, Rabbit or Amazon MQ. you should considerer another approach using a scheduled job. 2. For dead queues and policy retries RabbitMQ, for example, doesn't support your use case. https://medium.com/@kiennguyen88/rabbitmq-delay-retry-schedule-with-dead-letter-exchange-31fb25a440fc I'm not sure if that is possible SNS/SQS support, they have a maximum delay for delivery (maxDelayTarget) in seconds but it's not clear the number. You can check this out: https://docs.aws.amazon.com/sns/latest/dg/sns-message-delivery-retries.html