E-Commerce at Scale: Inside Shopify's Tech Stack

21,852
Shopify
Shopify powers tens of thousands of online retailers including General Electric, Amnesty International, CrossFit, Tesla Motors, Encyclopaedia Britannica, Foo Fighters, GitHub, and more. Our platform allows users to easily and quickly create their own online store without all the technical work involved in developing their own website, or the huge expense of having someone else build it. Shopify lets merchants manage all aspects of their shops: uploading products, changing the design, accepting credit card orders, and viewing their incoming orders and completed transactions.

Written by Kir Shatrov, Production Engineer at Shopify


Background

Shopify is a multi-channel commerce platform for small and medium businesses that lets you create a shop and sell products wherever you want: online via web store or social media and offline with a POS card reader. Shopify powers 600K merchants and serves 80K requests per second at peak.

While helping aspiring entrepreneurs to launch their stores, Shopify also holds some of the world's largest sales for the Super Bowl, Kylie Cosmetics, and celebrities like Justin Bieber and Kanye West. These "flash sales" are tricky from an engineering point of view because of their unpredictably large volumes of traffic.

My name is Kir Shatrov and I'm a Senior Production Engineer at Shopify working on the Service Patterns team. Our team owns areas like sharding, scalability and reliability of the platform. We provide guidelines and APIs on how to write software that scales by default, which essentially makes the rest of developers at Shopify our customers. Our team's motto is "make scale invisible for developers".


Engineering at Shopify

Before 2015, we had an Operations and Performance team. Around this time, we decided to create the Production Engineering department and merge the teams. The department is responsible for building and maintaining common infrastructure that allows the rest of product development teams to run their code.

Both Production Engineering and all the product development teams share responsibility for the ongoing operation of our end user applications. This means all technical roles share monitoring and incident response, with escalation happening laterally to bring in any skill set required to restore service in case of problems.


Initial architecture and stack

In 2004, Shopify’s CEO and founder, Tobi Lütke, was building out an e-commerce store for snowboarding products. Unsatisfied with the existing e-commerce products on the market, Tobi decided to build his own SaaS platform using Ruby on Rails.

At that time, Rails wasn't even 1.0 yet, and the only version of the framework was exchanged as a .zip archive by email. Tobi joined Rails creator David Heinemeier Hansson (DHH) and started contributing to Ruby on Rails while building Shopify.

Shopify is now one of the world's largest and oldest Rails apps. It’s never been rewritten and still uses the original codebase, though it has matured considerably over the past decade. All of Tobi’s original commits are still in the version control history.

The bet on Rails greatly shaped how we think at Shopify and empowered us to deliver product as fast as possible. While there are parts of the framework that sometimes make it harder to scale (e.g. ActiveRecord callbacks and code organization), many of us tend to agree with Tobi that Rails is what allowed Shopify to move from a garage startup to a public company.

The core Shopify app has remained a Rails monolith, but we also have hundreds of other Rails apps across the organization. These are not microservices, but domain-specific apps: Shipping (talks with various shipping providers), Identity (single sign on across all Shopify stores), and App Store to name a few. Managing a hundred apps and keeping them up to date with security updates can be tough, so we've developed ServicesDB, an internal app that keeps track of all production services and helps developers to make sure that they don't miss anything important.


ServicesDB ServicesDB in Action


ServicesDB keeps a checklist for each app: ownership, uptime, logs, on-call rotation, exception reporting, and gem security updates. If there are problems with any of those, ServicesDB opens a GitHub issue and pings owners of the app to ask them to address it. ServicesDB also makes it easy to query the infrastructure and answer questions like, “How many apps are on Rails 4.2? How many apps are using an outdated version of gem X? Which apps are calling this service?”.


Our current stack

As is common in the Rails stack, since the very beginning, we've stayed with MySQL as a relational database, memcached for key/value storage and Redis for queues and background jobs.


Shopify Rails Stack


In 2014, we could no longer store all our data in a single MySQL instance - even by buying better hardware. We decided to use sharding and split all of Shopify into dozens of database partitions.

Sharding played nicely for us because Shopify merchants are isolated from each other and we were able to put a subset of merchants on a single shard. It would have been harder if our business assumed shared data between customers.

The sharding project bought us some time regarding database capacity, but as we soon found out, there was a huge single point of failure in our infrastructure. All those shards were still using a single Redis. At one point, the outage of that Redis took down all of Shopify, causing a major disruption we later called “Redismageddon”. This taught us an important lesson to avoid any resources that are shared across all of Shopify.

Over the years, we moved from shards to the concept of "pods". A pod is a fully isolated instance of Shopify with its own datastores like MySQL, Redis, memcached. A pod can be spawned in any region. This approach has helped us eliminate global outages. As of today, we have more than a hundred pods, and since moving to this architecture we haven't had any major outages that affected all of Shopify. An outage today only affects a single pod or region.


Shopify Pods Architecture


As we grew into hundreds of shards and pods, it became clear that we needed a solution to orchestrate those deployments. Today, we use Docker, Kubernetes, and Google Kubernetes Engine to make it easy to bootstrap resources for new Shopify Pods. On the load balancer level we leverage Nginx, Lua and OpenResty which allow us to write scriptable load balancers.

The client-side stack of Shopify Admin has been a long journey. It started with HTML templates, jQuery and prototype.js. We moved to Batman.js, our in-house Single-Page-Application framework (SPA), in 2013. Then, we re-evaluated our approach and moved back to statically rendered HTML and vanilla JavaScript. As the front-end ecosystem matured, we felt that it was time to rethink our approach again. Last year, we started working on moving Shopify Admin to React and TypeScript.

Many things have changed since the days of jQuery and Batman. JavaScript execution is much faster. We can easily render our apps on the server to do less work on the client, and the resources and tooling for developers are substantially better with React than we ever had with Batman.

Another very notable difference is that now we have a much better solution for ensuring business logic does not leak into the client — GraphQL. The Admin becomes just another GraphQL client and follows the same patterns established by the mobile apps: no data persistence, no reliance on the server for anything that needs to be shared between clients, and extremely efficient fetching of resources for a view.


How we build, test, and deploy

The Shopify monolith has around 100K unit tests. Many of those involve heavy ORM calls, so they aren't very fast. To keep the shipping pipeline fast, we've massively invested in our CI infrastructure.

We use BuildKite as a CI platform. What makes BuildKite unique is that it lets you run tests in your own way, on your own hardware while BuildKite orchestrates builds and provides user interface.


Shopify BuildKite


The build of our monolith takes 15-20 minutes and involves hundreds of parallel CI workers to run all 100k tests. Parallel test workers allow us to keep shipping. Otherwise, a single build could take days. We have hundreds of developers shipping new features and improvements every day, and it’s crucial that we keep the continuous integration pipeline fast.

When the build is green, it's time to deploy changes to production. We don't practice staging or canary deploys, instead we rely on feature flags and fast rollbacks in case something goes wrong.


Shopify ShipIt Engine


ShipIt, our deployment tool, is at the heart of Continuous Delivery at Shopify. ShipIt is an orchestrator that runs and tracks progress of any deploy script that you provide for a project. It supports deploying to Rubygems, Pip, Heroku and Capistrano out of the box. For us, it's mostly kubernetes-deploy or Capistrano for legacy projects.


Shopify ShipIt Slack A ShipIt Slack notification sent when your code is being deployed


We use a slightly tweaked GitHub flow, with feature development going in branches and the master branch being the source of truth for the state of things in production. When your PR is ready, you add it to the Merge Queue in ShipIt. The idea behind the Merge Queue is to control the rate of code that is being merged to master branch. In the busy hours, we have many developers who want to merge the PRs, but at the same time we don't want to introduce too many changes to the system at the same time. Merge Queue limits deploys to 5-10 commits at a time, which makes it easier to identify issues and roll back in case we notice any unexpected behaviour after the deploy.

We use a browser extension to make Merge Queue play nicely with the Merge button on GitHub:


Shopify GitHub flow


Both ShipIt and kubernetes-deploy are open source, and we've heard quite a few success stories from companies who have adopted our flow.


Next Challenges

All systems at Shopify have to be designed with the scale in mind. At the same time, it still feels like you're working on a classic Rails app. The amount of engineering efforts put into this is incredible. For a developer writing a database migration, it looks just like it would for any other Rails app, but under the hood that migration would be asynchronously applied to a 100+ database shards with zero downtime. This story is similar for any other aspect of our infrastructure, from CI and tests to deploys.

In Production Engineering, we've put a lot of efforts to migrate our infrastructure to Kubernetes. Some approaches and design decisions had to be evaluated as they were not ready for cloud environments. At the same time, many of those investments into Kubernetes have already started to pay off. What took me days of writing Chef cookbooks before, now is a matter of a couple of changes in Kubernetes' YAML. I expect that our Kubernetes foundation will mature, and unlock us even more possibilities to scale.

With tools like Semian and Toxiproxy, we've done great job at shaping our monolith towards high reliability and resiliency. At the same time, we’re approaching one hundred other production services running at the company — most of them using Rails. With a tool like ServicesDB, we can verify that all of them are using the same patterns as the monolith, spreading the lessons we learned from a decade of operating Rails apps at scale.

Many of these services also need to talk to each other in some way, and how they do it is currently up to them. Some services communicate via a message log like Kafka and some use a REST API over HTTP. Lately, we've been looking into options for Shopify-wide RPC and Service Mesh. I expect that over the next year, we'll define how applications will communicate on our platform in a way that will be resilient and scalable by default.


Like the sound of this stack? Shopify is hiring. Come help us to make commerce better for everyone. Or join Production Engineering, and help us continue to evolve the stack that makes commerce better at Shopify than anywhere else in the world.

Shopify
Shopify powers tens of thousands of online retailers including General Electric, Amnesty International, CrossFit, Tesla Motors, Encyclopaedia Britannica, Foo Fighters, GitHub, and more. Our platform allows users to easily and quickly create their own online store without all the technical work involved in developing their own website, or the huge expense of having someone else build it. Shopify lets merchants manage all aspects of their shops: uploading products, changing the design, accepting credit card orders, and viewing their incoming orders and completed transactions.
Tools mentioned in article
Open jobs at Shopify
Engineering Lead - Privacy
Ottawa
Safeguarding our merchant's trust in us is Shopify's top priority. We must protect the information we collect and be conscientious about how we use it. This can be challenging when you process millions of transactions a year for hundreds of thousands of merchants all over the world. We are looking for someone to lead the Privacy and Data Management Team in building the systems that allow us to scale quickly while continuing to safeguard that trust. If you are someone who enjoys gaining a deep understanding of both technical and regulatory systems and finding creative solutions that work for both, then consider joining us in making commerce better for everyone.
  • Extensive experience leading multidisciplinary teams (including software developers) preferably as part of a compliance and risk-mitigation regime
  • Experience assessing and defining system specifications preferably in relation to compliance with international privacy regulations
  • Ability to craft compelling messages and be a strong advocate for proper data management systems at Shopify
  • Experience as a software developer, with the ability to jump in to help developers as needed
  • Experience defining best practices in a fast-moving technical organization
  • Experience managing risk in cloud-based platforms or services
  • Experience working with external auditors, regulators, and third parties interested in data management
  • Managing a team of developers and analysts who are improving the privacy safety nets that are used at Shopify to keep us moving quickly
  • Liaising with project teams and balancing their projects’ needs with privacy-conscious results
  • Leading the effort at the organizational level to educate developers, support, and other staff about privacy rules and best practices
  • Reviewing projects for potential privacy issues so that can be addressed at an early stage
  • Lead Software Engineers - Shipping Se...
    Ottawa, Toronto
    Shopify is the world’s fastest growing commerce platform, with no plans to slow down. In 2018, we’re massively scaling what shipping and fulfillment means to the hundreds of thousands of entrepreneurs and businesses that use Shopify and we need experienced people and technical engineering leads to make that happen.  This is a role where you can think big!  What is the future of logistics?  How can Shopify deliver the best merchant and consumer post point of sale experience on the planet? Working here, you’ll ship on quality instead of on time. Your teams will be deploying new code many times a day, on a massive production scale. We’re talking hundreds of thousands of online stores, and hundreds of millions of requests a day.  With entrepreneurs depending on you for their livelihood, it’s a tough, but incredibly rewarding responsibility. We’re looking for technical and people leads who are passionate about performance, accessibility, and building for the long-term.  You’ll have a real desire for solving tough problems with performant code.  You’ll have experience with shaping long-term product vision and owning its implementation. Over and above all that, perhaps your biggest achievements to date will have come from making everyone around you better as you've grown and developed your team.
  •  Proven leadership skills - from either a technical or people growth point of view
  • A commitment and drive for quality, technical excellence and results
  • A passion for growing development teams and improving others
  • Experience building large scale, high throughput systems and familiarity with full stack web development
  • An ability to grow the team from a hiring perspective, identifying the gaps and helping to fill them

  • Learn and grow constantly to feed your passion for self-improvement and make those around you better
  • Design and build innovative features that are driven by web scale data
  • Collaborate with other Shopify developers and external partners to provide the best shipping experience for merchants
  • Work through problems with your team, roll up your sleeves, form an opinion and advocate for engineering-specific roadmap items. 
  • Collaborate with other Shopify leaders, executives and external partners to provide the best commerce experience for our merchants

  • Building backend web services using several languages and frameworks  (Ruby on Rails, Java/JEE, Node.js, PHP, Python, …)
  • Working with relational databases and SQL (we’re mostly on MySQL with some Postgres)
  • Working with Rails or the desire to learn it quickly
  • Building and scaling user-focused web applications and/or mobile applications using different technologies (React, Angular, Backbone, Bootstrap, Swift, Android, …)

  •  Have a history of contributing to our community through code, documentation, mentoring, teaching, speaking, or organizing
  • Have shipping, fulfillment or warehousing experience
  • Have experience with development on a leading cloud provider (GCE, AWS, Azure, …)

  • Lead Software Developer - Shipping Se...
    Ottawa, Canada
    Shopify is the world’s fastest growing commerce platform, with no plans to slow down. In 2019, we’re massively scaling what shipping and fulfillment means to the hundreds of thousands of entrepreneurs and businesses that use Shopify and we need experienced people and technical development leads to make that happen.  This is a role where you can think big!  What is the future of logistics?  How can Shopify deliver the best merchant and consumer post point of sale experience on the planet? Working here, you’ll ship on quality instead of on time. Your teams will be deploying new code many times a day, on a massive production scale. We’re talking hundreds of thousands of online stores, and hundreds of millions of requests a day.  With entrepreneurs depending on you for their livelihood, it’s a tough, but incredibly rewarding responsibility. We’re looking for technical and people leads who are passionate about performance, accessibility, and building for the long-term.  You’ll have a real desire for solving tough problems with performant code.  You’ll have experience with shaping long-term product vision and owning its implementation. Over and above all that, perhaps your biggest achievements to date will have come from making everyone around you better as you've grown and developed your team.
  •  Proven leadership skills - from either a technical or people growth point of view
  • A commitment and drive for quality, technical excellence and results
  • A passion for growing development teams and improving others
  • Experience building large scale, high throughput systems and familiarity with full stack web development
  • An ability to grow the team from a hiring perspective, identifying the gaps and helping to fill them

  • Learn and grow constantly to feed your passion for self-improvement and make those around you better
  • Design and build innovative features that are driven by web scale data
  • Collaborate with other Shopify developers and external partners to provide the best shipping experience for merchants
  • Work through problems with your team, roll up your sleeves, form an opinion and advocate for engineering-specific roadmap items. 
  • Collaborate with other Shopify leaders, executives and external partners to provide the best commerce experience for our merchants

  • Building backend web services using several languages and frameworks  (Ruby on Rails, Java/JEE, Node.js, PHP, Python, …)
  • Working with relational databases and SQL (we’re mostly on MySQL with some Postgres)
  • Working with Rails or the desire to learn it quickly
  • Building and scaling user-focused web applications and/or mobile applications using different technologies (React, Angular, Backbone, Bootstrap, Swift, Android, …)

  •  Have a history of contributing to our community through code, documentation, mentoring, teaching, speaking, or organizing
  • Have shipping, fulfillment or warehousing experience
  • Have experience with development on a leading cloud provider (GCE, AWS, Azure, …)

  • Production Engineer
    Anywhere
    Are you looking for an opportunity to work on planet-scale infrastructure? Do you want your work to impact thousands of developers and millions of customers? Do you genuinely enjoy tackling complex problems, and learning through experimentation? Shopify Production Engineering is all this and more. The Production Engineering team builds and maintains Shopify’s critical infrastructure through software and systems engineering. We make sure Shopify—the world’s fastest growing commerce platform—stays reliable, performant, and scalable for our 1000+ member development team to build on, and our 600,000+ merchants to depend on. Our team covers the disciplines of site reliability engineering, infrastructure engineering, and developer productivity, all to ensure Shopify’s infrastructure is able to scale massively while staying resilient.
  • Build on top of one of the largest Kubernetes deployments in Google Cloud (we are operating a fleet of over 50+ clusters)
  • Collaborate with other Shopify developers to understand their needs and ensure our team works on the right things
  • Maintain Shopify’s Heroku-style self-service PaaS for our developers to consolidate over 400 production services
  • Help run our caching infrastructure and advise Shopify developers on effective use of the caching layers
  • Build tooling that delights Shopify developers and allows them to make an impact quickly
  • Make scale simpler to understand by building the service mesh layer in between Shopify’s infrastructure and the application level
  • Create our next-generation continuous-integration and continuous-delivery systems
  • Work as part of the engineering team to build and scale distributed, multi-region systems
  • Investigate and resolve production issues
  • Build Shopify’s predictable, scalable, and usable internal Search Infrastructure
  • Build and support infrastructure and tooling to protect our platform from bots and DDoS attacks
  • Autoscale compute up and down based on the demands of the platform, and further protect the platform by shedding lower priority requests as the load gets high
  • And plenty more!
  • Architecture to handle 80K RPS Celebrity Sales (Learn more about this on our engineering blog)
  • Bootsnap: Optimizing ruby app boot time
  • Services DB: A platform to manage services across various runtime environments
  • Systems and automation for our data centers
  • Shipit: Our open-source deployment tool
  • Being a generalist developer who is comfortable with multiple languages such as C, Ruby, and Go
  • Hands-on development with cloud infrastructure (AWS, GCE, Azure, Kubernetes, Docker)
  • Working anywhere in the stack, from right beside the OS and up
  • Working with a variety of open-source software including nginx, redis, and memcached
  • Building large distributed systems at scale
  • Creating and pushing adoption of development tools to a large, distributed development team
  • Automating development processes such as continuous integration and continuous delivery
  • Verified by
    Production Engineering Lead
    Production Engineer
    You may also like
    Building a Kubernetes Platform at Pinterest
    Rust at OneSignal
    How to Practically Use Performance API to Measure Performance
    Nine Experimentation Best Practices