What is Amazon SQS?
Who uses Amazon SQS?
Amazon SQS Integrations
Why developers like Amazon SQS?
Here are some stack decisions, common use cases and reviews by companies and developers who chose Amazon SQS in their tech stack.
Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.
We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient
Based on the above criteria, we selected the following tools to perform the end to end data replication:
We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.
We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.
In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.
Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.
In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!
In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.
While this does sound complicated, it’s as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it’s available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.
In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.
#ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService
At FundsCorner, we are on a mission to enable fast accessible credit to India’s Kirana Stores. We are an early stage startup with an ultra small Engineering team. All the tech decisions we have made until now are based on our core philosophy: "Build usable products fast".
Based on the above fundamentals, we chose Python as our base language for all our APIs and micro-services. It is ultra easy to start with, yet provides great libraries even for the most complex of use cases. Our entire backend stack runs on Python and we cannot be more happy with it! If you are looking to deploy your API as server-less, Python provides one of the least cold start times.
We build our APIs with Flask. For backend database, our natural choice was MongoDB. It frees up our time from complex database specifications - we instead use our time in doing sensible data modelling & once we finalize the data model, we integrate it into Flask using Swagger UI. Mongo supports complex queries to cull out difficult data through aggregation framework & we have even built an internal framework called "Poetry", for aggregation queries.
Our web apps are built on Vue.js , Vuetify and vuex. Initially we debated a lot around choosing Vue.js or React , but finally settled with Vue.js, mainly because of the ease of use, fast development cycles & awesome set of libraries and utilities backing Vue.
You simply cannot go wrong with Vue.js . Great documentation, the library is ultra compact & is blazing fast. Choosing Vue.js was one of the critical decisions made, which enabled us to launch our web app in under a month (which otherwise would have taken 3 months easily). For those folks who are looking for big names, Adobe, and Alibaba and Gitlab are using Vue.
By choosing Vuetify, we saved thousands of person hours in designing the CSS files. Vuetify contains all key material components for designing a smooth User experience & it just works! It's an awesome framework. All of us at FundsCorner are now lifelong fanboys of Vue.js and Vuetify.
On the infrastructure side, all our API services and backend services are deployed as server less micro-services through Zappa. Zappa makes your life super easy by packaging everything that is required to deploy your code as AWS Lambda. We are now addicted to the single - click deploys / updates through Zappa. Try it out & you will convert!
Also, if you are using Zappa, you can greatly simplify your CI / CD pipelines. Do try it! It's just awesome! and... you will be astonished by the savings you have made on AWS bills at end of the month.
Our CI / CD pipelines are built using GitLab CI. The documentation is very good & it enables you to go from from concept to production in minimal time frame.
We use Sentry for all crash reporting and resolution. Pro tip, they do have handlers for AWS Lambda , which made our integration super easy.
All our micro-services including APIs are event-driven. Our background micro-services are message oriented & we use Amazon SQS as our message pipe. We have our own in-house workflow manager to orchestrate across micro - services.
We host our static websites on Netlify. One of the cool things about Netlify is the automated CI / CD on git push. You just do a git push to deploy! Again, it is super simple to use and it just works. We were dogmatic about going server less even on static web sites & you can go server less on Netlify in a few minutes. It's just a few clicks away.
We use Google Compute Engine, especially Google Vision for our AI experiments.
For Ops automation, we use Slack. Slack provides a super-rich API (through Slack App) through which you can weave magical automation on boring ops tasks.
In 2010 we made the very difficult decision to entirely re-engineer our existing monolithic LAMP application from the ground up in order to address some growing concerns about it's long term viability as a platform.
Full application re-write is almost always never the answer, because of the risks involved. However the situation warranted drastic action as it was clear that the existing product was going to face severe scaling issues. We felt it better address these sooner rather than later and also take the opportunity to improve the international architecture and also to refactor the database in. order that it better matched the changes in core functionality.
PostgreSQL was chosen for its reputation as being solid ACID compliant database backend, it was available as an offering AWS RDS service which reduced the management overhead of us having to configure it ourselves. In order to reduce read load on the primary database we implemented an Elasticsearch layer for fast and scalable search operations. Synchronisation of these indexes was to be achieved through the use of Sidekiq's Redis based background workers on Amazon ElastiCache. Again the AWS solution here looked to be an easy way to keep our involvement in managing this part of the platform at a minimum. Allowing us to focus on our core business.
Rails ls was chosen for its ability to quickly get core functionality up and running, its MVC architecture and also its focus on Test Driven Development using RSpec and Selenium with Travis CI providing continual integration. We also liked Ruby for its terse, clean and elegant syntax. Though YMMV on that one!
Unicorn was chosen for its continual deployment and reputation as a reliable application server, nginx for its reputation as a fast and stable reverse-proxy. We also took advantage of the Amazon CloudFront CDN here to further improve performance by caching static assets globally.
We tried to strike a balance between having control over management and configuration of our core application with the convenience of being able to leverage AWS hosted services for ancillary functions (Amazon SES , Amazon SQS Amazon Route 53 all hosted securely inside Amazon VPC of course!).
Whilst there is some compromise here with potential vendor lock in, the tasks being performed by these ancillary services are no particularly specialised which should mitigate this risk. Furthermore we have already containerised the stack in our development using Docker environment, and looking to how best to bring this into production - potentially using Amazon EC2 Container Service
We migrated from Amazon SQS + Shoryuken to Sidekiq in order to have at-most-once delivery out of the box and more flexibility.
The UI builtin Rails makes it smoother for development and QA. Through the sidekiq rails engine we can easily see & understand which job is/was/will be executed, and even get some stats for free. Compared to SQS, we lose in scalability (need to manage the underlying Redis instance) but this is not so critical right now for our business size and the PROs clearly outweigh the CONs. Plugins allow to easily add distributed CRON scheduled jobs in there for almost free, and this is a core feature for us, so we no longer need to maintain a "scheduler" instance and we make our CRON jobs more resilient. The Sidekiq UI can easily be tweaked and for instance we have added a column that translates the CRON syntax into a human readable string, so it's easy for our Q/A to check whether the job is scheduled appropriately.
We still use Amazon SQS for some other apps, but no longer for our main Rails app.
delayed_job is a great Rails background job library for new projects, as it only uses what you already have: a relational database. We happily used it during the company’s first two years.
But it started to falter as our web and database transactions significantly grew. Our app interacted with users via SMS texts sent inside background jobs. Because the delayed_job daemon ran every couple seconds, this meant that users often waited several long seconds before getting text replies, which was not acceptable. Moreover, job processing was done inside AWS Elastic Beanstalk web instances, which were already under stress and not meant to handle jobs.
We needed a fast background job system that could process jobs in near real-time and integrate well with AWS. Sidekiq is a fast and popular Ruby background job library, but it does not leverage the Elastic Beanstalk worker architecture, and you have to maintain a Redis instance.
We ended up choosing active-elastic-job, which seamlessly integrates with worker instances and Amazon SQS. SQS is a fast queue and you don’t need to worry about infrastructure or scaling, as AWS handles it for you.
We noticed significant performance gains immediately after making the switch.
Amazon SQS's features
- A queue can be created in any region.
- The message payload can contain up to 256KB of text in any format. Each 64KB ‘chunk’ of payload is billed as 1 request. For example, a single API call with a 256KB payload will be billed as four requests.
- Messages can be sent, received or deleted in batches of up to 10 messages or 256KB. Batches cost the same amount as single messages, meaning SQS can be even more cost effective for customers that use batching.
- Long polling reduces extraneous polling to help you minimize cost while receiving new messages as quickly as possible. When your queue is empty, long-poll requests wait up to 20 seconds for the next message to arrive. Long poll requests cost the same amount as regular requests.
- Messages can be retained in queues for up to 14 days.
- Messages can be sent and read simultaneously.
- Developers can get started with Amazon SQS by using only five APIs: CreateQueue, SendMessage, ReceiveMessage, ChangeMessageVisibility, and DeleteMessage. Additional APIs are available to provide advanced functionality.