The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
#DataScience #DataStack #Data
The new APIs were developed using a spec-first API approach for speed and sanity. The details of this approach are described in this blog post, and we relied on using Swagger and associated tools like Swagger UI.
A new service was created for managing the data. It provides a REST API for external use, and an internal API based on GraphQL. The service is built using Kotlin for increased developer productivity and happiness, and the Spring-Boot framework. PostgreSQL was chosen for the persistence layer, as we have non-trivial requirements that cannot be easily implemented on top of a key-value store.
The front-end has been built using React and querying the back-end service using an internal GraphQL API. We have plans of providing a public GraphQL API in the future.
New Jira Integrations: Bitbucket CircleCI AWS CodePipeline Octopus Deploy jFrog Azure Pipelines
We use React because it is the best framework to work quickly, cleanly and with good results. The composition paradigms of React are far superior to most other frameworks and allow for creating a smart and logical component tree, that is high performant.
Also it can be used to create great UI in combination with ES6, TypeScript, GraphQL and Emotion, if used right. The ecosystem definitely gives solutions to nearly all problems. And I can only recommend using Gatsby, if you need Server-Side-Rendering!
Starting a large project is always a daunting task so I sat down to plan. I didn't discover Quasar Framework until quite late on in this process - in fact, I had already built a huge portion of my app.
I knew from the outset I wanted to use Vue.js because of it's simplicity and ease of use. Once that decision had been made I decided to look at Bootstrap style components already written for Vue and there were a great many BUT non of them ticked all the boxes so it was this that lead me to Quasar. I was actually looking for a calendar component and someone pointed me to Quasar in a stack overflow comment - I fell in love. I even called my wife in to show her this beautiful component I just found - I'm still not sure she shared my enthusiasm...
This was just the beginning for me with Quasar. At the time Quasar was v0.0.15 and since then it has grown from strength to strength. I have live apps on the App Store, Play Store and available online. Given it will soon handle browser extensions, this means Quasar can do anything your imagination would want.
Why did I choose Quasar Framework - because there is nothing like it on the market. Not that offers such a diverse build system using it's home brewed CLI, none that build SPA, PWA, SSR, Browser all from the same code base, none that have the same amazing community to help you when you're stuck and none that have a team that work hard to achieve the common goal Quasar Framework has - to make the development experience for everyone easier and better.
That's why I made the decision to use Quasar Framework.