E-Commerce at Scale: Inside Shopify's Tech Stack

24,554
Shopify
Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of their business, from payments to shipping. The Shopify platform was engineered for reliability and scale, making enterprise-level technology available to businesses of all sizes. Headquartered in Ottawa, Canada, Shopify currently powers over 1,000,000 businesses in approximately 175 countries and is trusted by brands such as Allbirds, Gymshark, PepsiCo, Staples, and many more.

Written by Kir Shatrov, Production Engineer at Shopify


Background

Shopify is a multi-channel commerce platform for small and medium businesses that lets you create a shop and sell products wherever you want: online via web store or social media and offline with a POS card reader. Shopify powers 600K merchants and serves 80K requests per second at peak.

While helping aspiring entrepreneurs to launch their stores, Shopify also holds some of the world's largest sales for the Super Bowl, Kylie Cosmetics, and celebrities like Justin Bieber and Kanye West. These "flash sales" are tricky from an engineering point of view because of their unpredictably large volumes of traffic.

My name is Kir Shatrov and I'm a Senior Production Engineer at Shopify working on the Service Patterns team. Our team owns areas like sharding, scalability and reliability of the platform. We provide guidelines and APIs on how to write software that scales by default, which essentially makes the rest of developers at Shopify our customers. Our team's motto is "make scale invisible for developers".


Engineering at Shopify

Before 2015, we had an Operations and Performance team. Around this time, we decided to create the Production Engineering department and merge the teams. The department is responsible for building and maintaining common infrastructure that allows the rest of product development teams to run their code.

Both Production Engineering and all the product development teams share responsibility for the ongoing operation of our end user applications. This means all technical roles share monitoring and incident response, with escalation happening laterally to bring in any skill set required to restore service in case of problems.


Initial architecture and stack

In 2004, Shopify’s CEO and founder, Tobi Lütke, was building out an e-commerce store for snowboarding products. Unsatisfied with the existing e-commerce products on the market, Tobi decided to build his own SaaS platform using Ruby on Rails.

At that time, Rails wasn't even 1.0 yet, and the only version of the framework was exchanged as a .zip archive by email. Tobi joined Rails creator David Heinemeier Hansson (DHH) and started contributing to Ruby on Rails while building Shopify.

Shopify is now one of the world's largest and oldest Rails apps. It’s never been rewritten and still uses the original codebase, though it has matured considerably over the past decade. All of Tobi’s original commits are still in the version control history.

The bet on Rails greatly shaped how we think at Shopify and empowered us to deliver product as fast as possible. While there are parts of the framework that sometimes make it harder to scale (e.g. ActiveRecord callbacks and code organization), many of us tend to agree with Tobi that Rails is what allowed Shopify to move from a garage startup to a public company.

The core Shopify app has remained a Rails monolith, but we also have hundreds of other Rails apps across the organization. These are not microservices, but domain-specific apps: Shipping (talks with various shipping providers), Identity (single sign on across all Shopify stores), and App Store to name a few. Managing a hundred apps and keeping them up to date with security updates can be tough, so we've developed ServicesDB, an internal app that keeps track of all production services and helps developers to make sure that they don't miss anything important.


ServicesDB ServicesDB in Action


ServicesDB keeps a checklist for each app: ownership, uptime, logs, on-call rotation, exception reporting, and gem security updates. If there are problems with any of those, ServicesDB opens a GitHub issue and pings owners of the app to ask them to address it. ServicesDB also makes it easy to query the infrastructure and answer questions like, “How many apps are on Rails 4.2? How many apps are using an outdated version of gem X? Which apps are calling this service?”.


Our current stack

As is common in the Rails stack, since the very beginning, we've stayed with MySQL as a relational database, memcached for key/value storage and Redis for queues and background jobs.


Shopify Rails Stack


In 2014, we could no longer store all our data in a single MySQL instance - even by buying better hardware. We decided to use sharding and split all of Shopify into dozens of database partitions.

Sharding played nicely for us because Shopify merchants are isolated from each other and we were able to put a subset of merchants on a single shard. It would have been harder if our business assumed shared data between customers.

The sharding project bought us some time regarding database capacity, but as we soon found out, there was a huge single point of failure in our infrastructure. All those shards were still using a single Redis. At one point, the outage of that Redis took down all of Shopify, causing a major disruption we later called “Redismageddon”. This taught us an important lesson to avoid any resources that are shared across all of Shopify.

Over the years, we moved from shards to the concept of "pods". A pod is a fully isolated instance of Shopify with its own datastores like MySQL, Redis, memcached. A pod can be spawned in any region. This approach has helped us eliminate global outages. As of today, we have more than a hundred pods, and since moving to this architecture we haven't had any major outages that affected all of Shopify. An outage today only affects a single pod or region.


Shopify Pods Architecture


As we grew into hundreds of shards and pods, it became clear that we needed a solution to orchestrate those deployments. Today, we use Docker, Kubernetes, and Google Kubernetes Engine to make it easy to bootstrap resources for new Shopify Pods. On the load balancer level we leverage Nginx, Lua and OpenResty which allow us to write scriptable load balancers.

The client-side stack of Shopify Admin has been a long journey. It started with HTML templates, jQuery and prototype.js. We moved to Batman.js, our in-house Single-Page-Application framework (SPA), in 2013. Then, we re-evaluated our approach and moved back to statically rendered HTML and vanilla JavaScript. As the front-end ecosystem matured, we felt that it was time to rethink our approach again. Last year, we started working on moving Shopify Admin to React and TypeScript.

Many things have changed since the days of jQuery and Batman. JavaScript execution is much faster. We can easily render our apps on the server to do less work on the client, and the resources and tooling for developers are substantially better with React than we ever had with Batman.

Another very notable difference is that now we have a much better solution for ensuring business logic does not leak into the client — GraphQL. The Admin becomes just another GraphQL client and follows the same patterns established by the mobile apps: no data persistence, no reliance on the server for anything that needs to be shared between clients, and extremely efficient fetching of resources for a view.


How we build, test, and deploy

The Shopify monolith has around 100K unit tests. Many of those involve heavy ORM calls, so they aren't very fast. To keep the shipping pipeline fast, we've massively invested in our CI infrastructure.

We use BuildKite as a CI platform. What makes BuildKite unique is that it lets you run tests in your own way, on your own hardware while BuildKite orchestrates builds and provides user interface.


Shopify BuildKite


The build of our monolith takes 15-20 minutes and involves hundreds of parallel CI workers to run all 100k tests. Parallel test workers allow us to keep shipping. Otherwise, a single build could take days. We have hundreds of developers shipping new features and improvements every day, and it’s crucial that we keep the continuous integration pipeline fast.

When the build is green, it's time to deploy changes to production. We don't practice staging or canary deploys, instead we rely on feature flags and fast rollbacks in case something goes wrong.


Shopify ShipIt Engine


ShipIt, our deployment tool, is at the heart of Continuous Delivery at Shopify. ShipIt is an orchestrator that runs and tracks progress of any deploy script that you provide for a project. It supports deploying to Rubygems, Pip, Heroku and Capistrano out of the box. For us, it's mostly kubernetes-deploy or Capistrano for legacy projects.


Shopify ShipIt Slack A ShipIt Slack notification sent when your code is being deployed


We use a slightly tweaked GitHub flow, with feature development going in branches and the master branch being the source of truth for the state of things in production. When your PR is ready, you add it to the Merge Queue in ShipIt. The idea behind the Merge Queue is to control the rate of code that is being merged to master branch. In the busy hours, we have many developers who want to merge the PRs, but at the same time we don't want to introduce too many changes to the system at the same time. Merge Queue limits deploys to 5-10 commits at a time, which makes it easier to identify issues and roll back in case we notice any unexpected behaviour after the deploy.

We use a browser extension to make Merge Queue play nicely with the Merge button on GitHub:


Shopify GitHub flow


Both ShipIt and kubernetes-deploy are open source, and we've heard quite a few success stories from companies who have adopted our flow.


Next Challenges

All systems at Shopify have to be designed with the scale in mind. At the same time, it still feels like you're working on a classic Rails app. The amount of engineering efforts put into this is incredible. For a developer writing a database migration, it looks just like it would for any other Rails app, but under the hood that migration would be asynchronously applied to a 100+ database shards with zero downtime. This story is similar for any other aspect of our infrastructure, from CI and tests to deploys.

In Production Engineering, we've put a lot of efforts to migrate our infrastructure to Kubernetes. Some approaches and design decisions had to be evaluated as they were not ready for cloud environments. At the same time, many of those investments into Kubernetes have already started to pay off. What took me days of writing Chef cookbooks before, now is a matter of a couple of changes in Kubernetes' YAML. I expect that our Kubernetes foundation will mature, and unlock us even more possibilities to scale.

With tools like Semian and Toxiproxy, we've done great job at shaping our monolith towards high reliability and resiliency. At the same time, we’re approaching one hundred other production services running at the company — most of them using Rails. With a tool like ServicesDB, we can verify that all of them are using the same patterns as the monolith, spreading the lessons we learned from a decade of operating Rails apps at scale.

Many of these services also need to talk to each other in some way, and how they do it is currently up to them. Some services communicate via a message log like Kafka and some use a REST API over HTTP. Lately, we've been looking into options for Shopify-wide RPC and Service Mesh. I expect that over the next year, we'll define how applications will communicate on our platform in a way that will be resilient and scalable by default.


Like the sound of this stack? Shopify is hiring. Come help us to make commerce better for everyone. Or join Production Engineering, and help us continue to evolve the stack that makes commerce better at Shopify than anywhere else in the world.

Shopify
Shopify is the leading omni-channel commerce platform. Merchants use Shopify to design, set up, and manage their stores across multiple sales channels, including mobile, web, social media, marketplaces, brick-and-mortar locations, and pop-up shops. The platform also provides merchants with a powerful back-office and a single view of their business, from payments to shipping. The Shopify platform was engineered for reliability and scale, making enterprise-level technology available to businesses of all sizes. Headquartered in Ottawa, Canada, Shopify currently powers over 1,000,000 businesses in approximately 175 countries and is trusted by brands such as Allbirds, Gymshark, PepsiCo, Staples, and many more.
Tools mentioned in article
Open jobs at Shopify
Senior Site Reliability Engineer (Ber...
EMEA, Europe
Shopify is now permanently remote and working towards a future that is digital by default. Learn more about what this can mean for you. Shopify has many critical components, and sometimes they fail. The Resiliency team are the ones ensuring we can get back to green as fast as possible when that happens. We will be setting the foundation for building and running resilient systems at Shopify.  This is a team of engineers with in-depth operational knowledge of the entire Shopify stack, who will act as first responders and leaders during an incident.   Our job is to get to a resolution as quickly as possible and guide teams to build a more resilient Shopify. We will build the tools and systems used to quickly resolve incidents, and will look to automate away the manual toil. Commerce happens 24/7, and we need to build a team that can respond whenever necessary.  We are hiring for a distributed team to provide availability in Germany or the UTC +2 timezone more broadly.
  • Respond to automated alerts and execute playbooks.
  • Manage ongoing incidents, using your understanding of Shopify to involve the right teams and resolve as quickly as possible.
  • Clean up the noise in our signals, ensuring we can get an understanding of the system and debug a problem easily.
  • Set the standards with teams for building resilient, debuggable systems.
  • Ensure we never fail for the same reason twice.
  • Follow up each incident to ensure the appropriate action items are in place and prioritized.

  • You have a broad understanding of how all the pieces of Shopify fit together.
  • You have some strong opinions about improving our current processes.
  • You strongly reject the idea that on call has to suck.
  • You have handled multiple IMOC/on call shifts, and have navigated more than one incident through to the RCA process.
  • You have experience working with a variety of open-source software including nginx, redis, memcached and MySQL.
  • You have familiarity with network and web protocols, from IP to HTTP.
  • Solutions Engineering Team Lead, Shop...
    Americas
    Shopify Plus provides enterprise ecommerce with the freedom for businesses to grow, adapt, and evolve. Shopify Plus is helping power commerce for companies like Rebecca Minkoff, MVMT, Nestle, Kylie Cosmetics, and many more. We believe large merchants should love their commerce platform and we work hard each day to make that happen. The Solutions Engineering Lead is responsible for defining and scaling a team that will provide solution services to merchants in the Shopify Plus sales funnel across North America. The Solutions Engineering Lead is ultimately responsible for the team’s revenue contribution, merchant retention through onboarding, and solution enablement for Shopify Plus. While working closely with the Sales organization, this role is part of the broader leadership group for Shopify Plus in North America, and would also encompass strategic initiatives that would impact all technical merchant facing teams.  We are looking for two leaders. One individual to support the EDT/EST team and one individual to support the PDT/PST team.
  • Manage, lead and grow a team of enterprise Solutions Engineers
  • Create a collaborative and engaging culture to reward effective technical sales, responsibility and long term merchant success
  • Identify and secure merchants that are a strong fit for the platform
  • Identify opportunities to scale the impact of the Shopify Plus solution  
  • Build and maintain internal relationships (Sales, Launch, Product...etc.) while advocating for prospective and current merchants
  • Identify and own initiatives to improve and develop the team’s craft and performance
  • Coach, mentor and develop the team through strong and frequent feedback 
  • Directly contributing to deals to gain context and understanding of the Solutions Engineer role
  • Develop, iterate and manage the engagement model with the sales team while effectively increasing deal win rates
  • Optimize an onboarding path for new hires and development paths for current team members
  • Define clear expectations of the Solutions Engineering role, job levels and career paths utilizing metrics, goals and skill sets
  • Coach and amplify the impact of the Solutions Engineering and broader Technical Services team
  • Facilitate a strong team culture based on trust  

  • Experience building, managing and leading high-impact teams
  • Strong technical background, with demonstrated success in research and development, sales and/or merchant facing roles
  • Experience using performance metrics based for team coaching and development
  • Experience working closely with or directly as a Solutions Engineer individual contributor or leader
  • Passion for the pre-sales process and working with sales teams
  • Ability to speak confidently about technical solutions, roadmaps, partner ecosystems, competitors, and common enterprise merchant integrations and challenges
  • Strong experience with stakeholder management and proven ability to work cross-functionally
  • Launch Engineer - Shopify Plus (Remot...
    EMEA, Europe
    Shopify Plus is making enterprise commerce simple. We give high growth, high volume merchants the scalability, reliability and flexibility they need. Shopify Plus is helping to power commerce for companies like Gymshark, Heinz, Lindt, Simba Sleep, Nestle, LG, Sony, and many more. We are in hyper-growth, and this is where you come in. In your role as a Launch Engineer, you'll be partnering with new Shopify Plus merchants from Europe, Middle East and Africa (EMEA) as they come onto the platform. You will work with them to ensure that they have a solid plan and timeline for their store's launch. To support their business processes and initiatives, you'll consult on their ecommerce technology stack, and provide guidance on how to best leverage the Shopify platform and ecosystem. Once a shop is launched, you will be responsible for a smooth transition to the Merchant Success team.
  • Understanding of commerce best practices, including how to migrate data from other platforms into Shopify.
  • Experience working in an application ecosystem and integrating 3rd party technologies into the Shopify platform.
  • Good project management mindset. This is not a true project management role, but does share some skills with it, especially managing expectations and creating alignment.
  • Strong verbal and written communication skills with an ability to clearly articulate problems, solutions, and ideas.
  • Ability to consult on change management in relation to tech stackAPI knowledge. You won’t need to write code or build an app, but you must be able to understand API documentation, use a REST client and know how to conceptualise an API solution.
  • Good understanding of technical strategy as it relates to operating and growing a commerce business.
  • Comfortable working from home in a “Digital by Default” company.
  • Growth mindset that thrives on change in a dynamic environment.
  • Curious mind for how things work and what makes them tick.
  • Affinity for helping and teaching others; both merchants and colleagues.
  • Constant thirst to learn.
  • Willing to engage in the open discussion of difficult or uncomfortable ideas.
  • Deep desire for self improvement and exploring how you can become your best future self, and how that can create impact at Shopify.
  • Note: If some of the technology and concepts are new to you, that’s OK! We know not everyone will come in fully familiar, and we provide support to learn on the job in an environment that encourages personal development.
  • Business fluency in one or more of these languages: German, French, Italian, Spanish.
  • Experience with Liquid, the Shopify templating language. You won't need to make a theme but you should know how to make small changes and what's possible with our themes.
  • Project management. In some cases you'll be reviewing formal project outlines from merchants to confirm requirements for a successful shop launch.
  • Data systems and best practices with interfacing with them.
  • Running your own business. You know what it means to be an entrepreneur and can share your learnings with our merchants.
  • Building Shopify stores, either for yourself or clients.
  • Educating merchants on proper store setup (domains, redirects, product taxonomy, building collections, shipping, etc).
  • Working with merchants' internal and 3rd party design and development teams.
  • Consulting with merchants on interesting and complex shop functionality and business system integrations.
  • Helping our merchants to convert their store ideas into real and actionable plans.
  • Providing documentation & support during the store development phase.
  • Liaising with internal Shopify teams, such as the Shopify Plus sales team, the Merchant Success team, Solutions Engineering, and our internal support team.
  • Progressing the site status so stores launch as efficiently as possible.
  • Providing technical support for our Merchant Success team, and in some cases join calls between the Merchant Success Manager and their merchant.
  • Merchant Experience Lead - Shopify Fu...
    Americas
    First, let's get you up to speed.  Last year, Shopify launched the Shopify Fulfillment Network (SFN).  We followed that up by acquiring 6 River Systems, a leader in warehouse robotics.  We’re early in our mission to democratize fulfillment, but it’s one we take very seriously.  And we can’t carry out this mission without putting our merchants front and centre, and so, with a recent growth spurt, we think now is the perfect time to add a lead to our team of Merchant Experience Coordinators.    Want to know more?  Here’s some quick answers to the top four questions we think many people will have. Q:  What does a SFN Merchant Experience Coordinator do? A: This role provides frontline merchant service and support.   They work very closely with our Fulfillment Merchant Managers provide build the merchant experience. They also interact with other Shopify teams, including SFN Gurus, operations managers, and developers.  These individuals have  exceptional problem solving skills, massive amounts of empathy, and a dedication to providing a super-high level of customer service.  Q:  What are the must-haves for the lead role? A:  You've led people. Ideally, you've led a customer service team who handle complex, inbound requests for help. If you had prior, customer service experience in distribution/ fulfillment and/or warehousing, that would be fantastic as well.  Q: Where is the role located? A:  With all of Shopify now Digital by Default, so you could be just about anywhere.  Saying that though, SFN currently only serves North American merchants - so our preference is that you are already working  in a North American time zone.  Q.  How can I find out more? A:  Read on!
  • Front line support and operations:  The team solves and triages high complexity merchant issues that get escalated .  Which ones are the most critical?  Which ones can wait?  Which ones can you solve?  Which ones should be escalated?   As a lead, we want you to be able to coach, mentor and develop the team to better develop their problem solving skills and make the support they provide to merchants that much more efficient. 

  • Working across disciplines.  The team works with account managers and other Shopifolk to provide focused and helpful assistance to merchants when they need it the most.  A strong feedback loop between teams is key to success here.  As a lead we want you to establish and build the structure these cross functional relationships sit on.  How will you develop, document and solidify those feedback loops? 

  • Build for the long term:  The team is going to be plenty busy solving the problems our merchants face. Not only do they need to resolve the merchant issue as they come in but also put systems in place to be proactive and to scale.  As a lead, we want you to be able to zoom in and out, helping your team solve those problems, but also connecting the dots, seeing patterns and creating longer-term sustainable processes. You put merchant experience in the centre in your decision making.  
  • Leadership responsibilities
  • Customer Service experience in the distribution/fulfillment and/or warehousing industries.
  • Empathy for the customers you’ve worked with.
  • A sense of urgency.  It’s in your nature to get shit done….efficiently and quickly.
  • Experience with using helpdesk ticketing systems
  • Exceptional communication skills - verbal and written
  • A precise attention to detail
  • An ability to work in North American time zones.
  • A sense of humour!
  • Verified by
    Production Engineering Lead
    Engineering Lead
    You may also like