What is Amazon EC2 Container Service?
Who uses Amazon EC2 Container Service?
Amazon EC2 Container Service Integrations
Why developers like Amazon EC2 Container Service?
Here are some stack decisions, common use cases and reviews by companies and developers who chose Amazon EC2 Container Service in their tech stack.
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
I'm planning to create a web application and also a mobile application to provide a very good shopping experience to the end customers. Shortly, my application will be aggregate the product details from difference sources and giving a clear picture to the user that when and where to buy that product with best in Quality and cost.
I have planned to develop this in many milestones for adding N number of features and I have picked my first part to complete the core part (aggregate the product details from different sources).
As per my work experience and knowledge, I have chosen the followings stacks to this mission.
Service: I have planned to use Java as the main business layer language as I have 7+ years of experience on this I believe I can do better work using Java than other languages. In addition, I'm thinking to use the stacks Node.js.
Database and ORM: I'm gonna pick MySQL as DB and Hibernate as ORM since I have a piece of good knowledge and also work experience on this combination.
Search Engine: I need to deal with a large amount of product data and it's in-detailed info to provide enough details to end user at the same time I need to focus on the performance area too. so I have decided to use Solr as a search engine for product search and suggestions. In addition, I'm thinking to replace Solr by Elasticsearch once explored/reviewed enough about Elasticsearch.
Host: As of now, my plan to complete the application with decent features first and deploy it in a free hosting environment like Docker and Heroku and then once it is stable then I have planned to use the AWS products Amazon S3, EC2, Amazon RDS and Amazon Route 53. I'm not sure about Microsoft Azure that what is the specialty in it than Heroku and Amazon EC2 Container Service. Anyhow, I will do explore these once again and pick the best suite one for my requirement once I reached this level.
Build and Repositories: I have decided to choose Apache Maven and Git as these are my favorites and also so popular on respectively build and repositories.
Additional Utilities :) - I would like to choose Codacy for code review as their Startup plan will be very helpful to this application. I'm already experienced with Google CheckStyle and SonarQube even I'm looking something on Codacy.
Happy Coding! Suggestions are welcome! :)
The recent move of our CI/CD tooling to AWS CodeBuild / AWS CodeDeploy (with GitHub ) as well as moving to Amazon EC2 Container Service / AWS Lambda for our deployment architecture for most of our services has helped us significantly reduce our deployment times while improving both feature velocity and overall reliability. In one extreme case, we got one service down from 90 minutes to a very reasonable 15 minutes. Container-based build and deployments have made so many things simpler and easier and the integration between the tools has been helpful. There is still some work to do on our service mesh & API proxy approach to further simplify our environment.
To load data from our Amazon S3 data warehouse into the Elasticsearch cluster, I developed a Spark application that uses PySpark to extract data from S3, partition, then batch-send each partition to Elasticsearch to increase parallelism. The Spark job enables fielddata: true for text columns with low cardinality to allow sub-aggregations by text columns and prevents data duplication by adding a unique _id field to each row in the dataframe.
The job can then be run by data scientists in Flotilla, an internal data platform tool for running jobs on Amazon EC2 Container Service, with environment variables specifying which schema and table to load.
In 2010 we made the very difficult decision to entirely re-engineer our existing monolithic LAMP application from the ground up in order to address some growing concerns about it's long term viability as a platform.
Full application re-write is almost always never the answer, because of the risks involved. However the situation warranted drastic action as it was clear that the existing product was going to face severe scaling issues. We felt it better address these sooner rather than later and also take the opportunity to improve the international architecture and also to refactor the database in. order that it better matched the changes in core functionality.
PostgreSQL was chosen for its reputation as being solid ACID compliant database backend, it was available as an offering AWS RDS service which reduced the management overhead of us having to configure it ourselves. In order to reduce read load on the primary database we implemented an Elasticsearch layer for fast and scalable search operations. Synchronisation of these indexes was to be achieved through the use of Sidekiq's Redis based background workers on Amazon ElastiCache. Again the AWS solution here looked to be an easy way to keep our involvement in managing this part of the platform at a minimum. Allowing us to focus on our core business.
Rails ls was chosen for its ability to quickly get core functionality up and running, its MVC architecture and also its focus on Test Driven Development using RSpec and Selenium with Travis CI providing continual integration. We also liked Ruby for its terse, clean and elegant syntax. Though YMMV on that one!
Unicorn was chosen for its continual deployment and reputation as a reliable application server, nginx for its reputation as a fast and stable reverse-proxy. We also took advantage of the Amazon CloudFront CDN here to further improve performance by caching static assets globally.
We tried to strike a balance between having control over management and configuration of our core application with the convenience of being able to leverage AWS hosted services for ancillary functions (Amazon SES , Amazon SQS Amazon Route 53 all hosted securely inside Amazon VPC of course!).
Whilst there is some compromise here with potential vendor lock in, the tasks being performed by these ancillary services are no particularly specialised which should mitigate this risk. Furthermore we have already containerised the stack in our development using Docker environment, and looking to how best to bring this into production - potentially using Amazon EC2 Container Service
We started using Amazon EC2 Container Service 3 years ago because it was the easiest containers orchestration tool to start with. At the time it was missing a lot of features compared to other tools, but it was still the fastest way to deploy a container on AWS. As with any AWS product, over time they caught up and improved it significantly. Today it probably one of the best tools in its category. It might not have all the feature Kubernetes has, but it also has less complexity. And it definitely has all the features a small company/team needs.
Amazon EC2 Container Service's Features
- Docker Compatibility
- Managed Clusters
- Programmatic Control
- Task Definitions
- Docker Repository