Hi! I am creating a scraping system in Django, which involves long running tasks between 1 minute & 1 Day. As I am new to Message Brokers and Task Queues, I need advice on which architecture to use for my system. ( Amazon SQS, RabbitMQ, or Celery). The system should be autoscalable using Kubernetes(K8) based on the number of pending tasks in the queue.
I am just a beginner at these two technologies.
Problem statement: I am getting lakh of users from the sequel server for whom I need to create caches in MongoDB by making different REST API requests.
Here these users can be treated as messages. Each REST API request is a task.
I am confused about whether I should go for RabbitMQ alone or Celery.
If I have to go with RabbitMQ, I prefer to use python with Pika module. But the challenge with Pika is, it is not thread-safe. So I am not finding a way to execute a lakh of API requests in parallel using multiple threads using Pika.
If I have to go with Celery, I don't know how I can achieve better scalability in executing these API requests in parallel.
For large amounts of small tasks and caches I have had good luck with Redis and RQ. I have not personally used celery but I am fairly sure it would scale well, and I have not used RabbitMQ for anything besides communication between services. If you prefer python my suggestions should feel comfortable.
Sorry I do not have a more information
Hi, we are in a ZMQ set up in a push/pull pattern, and we currently start to have more traffic and cases that the service is unavailable or stuck. We want to: * Not loose messages in services outages * Safely restart service without losing messages (ZeroMQ seems to need to close the socket in the receiver before restart manually)
Do you have experience with this setup with ZeroMQ? Would you suggest RabbitMQ or Amazon SQS (we are in AWS setup) instead? Something else?
Thank you for your time
ZeroMQ is fast but you need to build build reliability yourself. There are a number of patterns described in the zeromq guide. I have used RabbitMQ before which gives lot of functionality out of the box, you can probably use the
worker queues example from the tutorial, it can also persists messages in the queue.
I haven't used Amazon SQS before. Another tool you could use is Kafka.
Maybe not an obvious comparison with Kafka, since Kafka is pretty different from rabbitmq. But for small service, Rabbit as a pubsub platform is super easy to use and pretty powerful. Kafka as an alternative was the original choice, but its really a kind of overkill for a small-medium service. Especially if you are not planning to use k8s, since pure docker deployment can be a pain because of networking setup. Google PubSub was another alternative, its actually pretty cheap, but I never tested it since Rabbit was matching really good for mailing/notification services.
In addition to being a lot cheaper, Google Cloud Pub/Sub allowed us to not worry about maintaining any more infrastructure that needed.
We moved from a self-hosted RabbitMQ over to CloudAMQP and decided that since we use GCP anyway, why not try their managed PubSub?
It is one of the better decisions that we made, and we can just focus about building more important stuff!
Sign up to add or upvote prosMake informed product decisions
Sign up to add or upvote consMake informed product decisions
What is ActiveMQ?
What is CloudAMQP?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions