We migrated to Kubernetes around January of 2018 as a way to reduce operations costs and manage future scaling. This decision forced us into other improvements and better habits which have paid large dividends in product stability and developer productivity. Thanks largely to kops, choosing Kubernetes in particular was easy because deploying and configuring it was easy.
We first began using Papertrail largely because of the plug-and-play integration they provided with Kubernetes. Being able to launch a single daemonset and see logs in a few seconds was a strong demonstration of its capabilities and it's continued to be valuable any time we need to drill into alerts.
Kafka was only introduced to our platform in August 2018 as a means to manage our data pipeline and to replace other messaging systems used to decouple various components in our system. Kafka provides the scale and storage we need to manage data for however many devices we might service. Additionally, Kafka has helped us lay the framework for improved and highly detailed statistics gathering and analysis.
One of the very first tools I pulled in when I joined MachineShop was Datadog. We were lacking monitoring and Datadog was my go-to and in the subsequent years its thoroughly proven itself as reliable and informative. We use Datadog to both detect a wide variety of system anomalies and errors as well as provide highly detailed dashboards that help to indicate our system's health at a glance.
MachineShop adopted Go early on thanks to its robust ability to target specific architectures during builds, a key component of our edge daemon. As of January 2018, we now also run Go for the vast majority of our backend services as well thanks to efforts to move away from Java-based systems that weren't container-friendly. Additionally, we operate on a monorepo+microservice structure that allows us to ship highly specialized services when needed while also maximizing code re-use.
We use Runscope in order to run system tests on all of our environments around the clock. This helps us identify system-spanning issues that aren't visible in isolation and react to them before they hit production and also serves as yet another warning system just in case something does find its way into prod.
nginx became part of our stack largely by virtue of the ingress-nginx plugin for Kubernetes. It's proved reliable and easy to work with and helped us bring down our costs by moving from AWS Elastic Load Balancing (ELB)-backed services to Kubernetes ingresses.