Hi! I am creating a scraping system in Django, which involves long running tasks between 1 minute & 1 Day. As I am new to Message Brokers and Task Queues, I need advice on which architecture to use for my system. ( Amazon SQS, RabbitMQ, or Celery). The system should be autoscalable using Kubernetes(K8) based on the number of pending tasks in the queue.
I am just a beginner at these two technologies.
Problem statement: I am getting lakh of users from the sequel server for whom I need to create caches in MongoDB by making different REST API requests.
Here these users can be treated as messages. Each REST API request is a task.
I am confused about whether I should go for RabbitMQ alone or Celery.
If I have to go with RabbitMQ, I prefer to use python with Pika module. But the challenge with Pika is, it is not thread-safe. So I am not finding a way to execute a lakh of API requests in parallel using multiple threads using Pika.
If I have to go with Celery, I don't know how I can achieve better scalability in executing these API requests in parallel.
For large amounts of small tasks and caches I have had good luck with Redis and RQ. I have not personally used celery but I am fairly sure it would scale well, and I have not used RabbitMQ for anything besides communication between services. If you prefer python my suggestions should feel comfortable.
Sorry I do not have a more information
Hi, we are in a ZMQ set up in a push/pull pattern, and we currently start to have more traffic and cases that the service is unavailable or stuck. We want to: * Not loose messages in services outages * Safely restart service without losing messages (ZeroMQ seems to need to close the socket in the receiver before restart manually)
Do you have experience with this setup with ZeroMQ? Would you suggest RabbitMQ or Amazon SQS (we are in AWS setup) instead? Something else?
Thank you for your time
ZeroMQ is fast but you need to build build reliability yourself. There are a number of patterns described in the zeromq guide. I have used RabbitMQ before which gives lot of functionality out of the box, you can probably use the
worker queues example from the tutorial, it can also persists messages in the queue.
I haven't used Amazon SQS before. Another tool you could use is Kafka.
Maybe not an obvious comparison with Kafka, since Kafka is pretty different from rabbitmq. But for small service, Rabbit as a pubsub platform is super easy to use and pretty powerful. Kafka as an alternative was the original choice, but its really a kind of overkill for a small-medium service. Especially if you are not planning to use k8s, since pure docker deployment can be a pain because of networking setup. Google PubSub was another alternative, its actually pretty cheap, but I never tested it since Rabbit was matching really good for mailing/notification services.
In addition to being a lot cheaper, Google Cloud Pub/Sub allowed us to not worry about maintaining any more infrastructure that needed.
We moved from a self-hosted RabbitMQ over to CloudAMQP and decided that since we use GCP anyway, why not try their managed PubSub?
It is one of the better decisions that we made, and we can just focus about building more important stuff!
Sign up to add or upvote prosMake informed product decisions
Sign up to add or upvote consMake informed product decisions
What is Amazon SQS?
What is Kafka?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
I developed one of the largest queue based medical results delivery systems in the world, 18,000+ queues and still growing over a decade later all using MQSeries, later called Websphere MQ. When I left that company I started using RabbitMQ after doing some research on free offerings.. it works brilliantly and is incredibly flexible from small scale single instance use to large scale multi-server - multi-site architectures.
If you can think in queues then RabbitMQ should be a viable solution for integrating disparate systems.
In the beginning we thought we wanted to start using something like RabbitMQ or maybe Kafka or maybe ActiveMQ. Back then we only had a few developers and no ops people. That has changed now, but we didn't really look forward to setting up a queuing cluster and making sure that all works.
What we did instead was we looked at what services Amazon offers to see if we can use those to build our own messaging system within those services. That's basically what we did. We wrote some clients in Ruby that can basically do the entire orchestration for us, and we run all our messaging on both SNS and SQS. Basically what you can do in Amazon services is you can use Amazon Simple Notification Service, so SNS, for creating topics and you can use queues to subscribe to these topics. That's basically all you need for a messaging system. You don't have to worry about scalability at all. That's what really appealed to us.
Front-end messages are logged to Kafka by our API and application servers. We have batch processing (on the middle-left) and real-time processing (on the middle-right) pipelines to process the experiment data. For batch processing, after daily raw log get to s3, we start our nightly experiment workflow to figure out experiment users groups and experiment metrics. We use our in-house workflow management system Pinball to manage the dependencies of all these MapReduce jobs.
This isn't exactly low-latency (10s to 100s of milliseconds), but it has good throughput and a simple API. There is good reliability, and there is no configuration necessary to get up and running. A hosted queue is important when trying to move fast.
The poster child for scalable messaging systems, RabbitMQ has been used in countless large scale systems as the messaging backbone of any large cluster, and has proven itself time and again in many production settings.
Rabbit acts as our coordinator for all actions that happen during game time. All worker containers connect to rabbit in order to receive game events and emit their own events when applicable.
Used as central Message Broker; off-loading tasks to be executed asynchronous, used as communication tool between different microservices, used as tool to handle peaks in incoming data, etc.
RabbitMQ is the enterprise message bus for our platform, providing infrastructure for managing our ETL queues, real-time event notifications for applications, and audit logging.
RabbitMQ is an all purpose queuing service for our stack. We use it for user facing jobs as well as keeping track of behind the scenes jobs.
SQS is the bridge between our new Lambda services and our incumbent Rails applications. Extremely easy to use when you're already using other AWS infrastructure.
Building out real-time streaming server to present data insights to Coolfront Mobile customers and internal sales and marketing teams.
Primary message queue. Enqueueing operations revert to a local file-system-based queue when SQS is unavailable.
I can't afford to lose data if Dynamo throttles my writes, so everything goes into a message queue first.