As Mixmax began to scale super quickly, with more and more customers joining the platform, we started to see that the Meteor app was still having a lot of trouble scaling due to how it tried to provide its reactivity layer. To be honest, this led to a brutal summer of playing Galaxy container whack-a-mole as containers would saturate their CPU and become unresponsive. I’ll never forget hacking away at building a new microservice to relieve the load on the system so that we’d stop getting paged every 30-40 minutes. Luckily, we’ve never had to do that again! After stabilizing the system, we had to build out two more microservices to provide the necessary reactivity and authentication layers as we rebuilt our Meteor app from the ground up in Node.js. This also had the added benefit of being able to deploy the entire application in the same AWS VPCs. Thankfully, AWS had also released their ALB product so that we didn’t have to build and maintain our own websocket layer in Amazon EC2. All of our microservices, except for one special Go one, are now in Node with an nginx frontend on each instance, all behind AWS Elastic Load Balancing (ELB) or ALBs running in AWS Elastic Beanstalk.
Karate DSL is extremely effective in those situations where you have a microservice still in development, but the "consumer" web-UI dev team needs to make progress. Just create a mock definition (feature) file, and since it is plain-text - it can easily be shared across teams via Git. Since Karate has a binary stand-alone executable, even teams that are not familiar with Java can use it to stand-up mock services. And the best part is that the mock serves as a "contract" - which the server-side team can use to practice test-driven development.
The new APIs were developed using a spec-first API approach for speed and sanity. The details of this approach are described in this blog post, and we relied on using Swagger and associated tools like Swagger UI.
A new service was created for managing the data. It provides a REST API for external use, and an internal API based on GraphQL. The service is built using Kotlin for increased developer productivity and happiness, and the Spring-Boot framework. PostgreSQL was chosen for the persistence layer, as we have non-trivial requirements that cannot be easily implemented on top of a key-value store.
The front-end has been built using React and querying the back-end service using an internal GraphQL API. We have plans of providing a public GraphQL API in the future.
New Jira Integrations: Bitbucket CircleCI AWS CodePipeline Octopus Deploy jFrog Azure Pipelines
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
I use Visual Studio Code because of community support and popularity it gained in very short period and many more extensions that are being contributed by community every day. I like the Python Engine in VSCode makes my work life productive. My most fav extensions are
Themes are always fun and make your development IDE productive especially with colors and error indicators etc..
We use Sass because I invented it! No, that's not a joke at all! Well, let me explain. So, we used Sass before I started at Rent the Runway because it's the de-facto industry standard for pre-compiled and pre-processed CSS. We do also use PostCSS for stuff like vendor prefixing and various transformations, but Sass (specifically SCSS) is the main developer-focused language for describing our styling. Some internal apps use styled-components and @Aphrodite, but our main website is allllll Sassy. Oh, but the non-joking part is the inventing part. /shrug
We just launched the Segment Config API (try it out for yourself here) — a set of public REST APIs that enable you to manage your Segment configuration. A public API is only as good as its #documentation. For the API reference doc we are using Postman.
Postman is an “API development environment”. You download the desktop app, and build API requests by URL and payload. Over time you can build up a set of requests and organize them into a “Postman Collection”. You can generalize a collection with “collection variables”. This allows you to parameterize things like
workspace_name so a user can fill their own values in before making an API call. This makes it possible to use Postman for one-off API tasks instead of writing code.
Then you can add Markdown content to the entire collection, a folder of related methods, and/or every API method to explain how the APIs work. You can publish a collection and easily share it with a URL.
This turns Postman from a personal #API utility to full-blown public interactive API documentation. The result is a great looking web page with all the API calls, docs and sample requests and responses in one place. Check out the results here.
Postman’s powers don’t end here. You can automate Postman with “test scripts” and have it periodically run a collection scripts as “monitors”. We now have #QA around all the APIs in public docs to make sure they are always correct
Along the way we tried other techniques for documenting APIs like ReadMe.io or Swagger UI. These required a lot of effort to customize.
Writing and maintaining a Postman collection takes some work, but the resulting documentation site, interactivity and API testing tools are well worth it.
Kubernetes powers our #backend services as it is very easy in terms of #devops (the managed version). We deploy everything using @helm charts as it provides us to manage deployments the same way we manage our code on GitHub . On every commit a CircleCI job is triggered to run the tests, build Docker images and deploy them to the registry. Finally on every master commit CircleCI also deploys the relevant service using Helm chart to our Kubernetes cluster
I built a project using Quasar Framework with Vue.js, vuex and axios on the frontend and Go, Gin Gonic and PostgreSQL on the backend. Deployment was realized using Docker and Docker Compose. Now I can build the desktop and the mobile app using a single code base on the frontend. UI responsiveness and performance of this stack is amazing.