How LaunchDarkly Serves Over 4 Billion Feature Flags Daily

12,563
LaunchDarkly
Serving over 200 billion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.

Editor's note: By John Kodumal, CTO, LaunchDarkly



LaunchDarkly Platform


Background

Feature flagging (wrapping a feature in a flag that’s controlled outside of deployment) is a technique for effective continuous delivery. For example, you can wrap a new signup form in a feature flag and then control which users see that form, all without having to redeploy code or modify a database. Engineering-driven companies (think Google, Facebook, Twitter) invest heavily in custom-built feature flag management systems to roll features out to whom they want, when they want. Smaller companies build and maintain their own feature flagging infrastructure or using simple open source projects that often don't even have a UI. I was previously an engineering manager at Atlassian, where I’d seen a team work on an internal feature flagging system, so I was aware of the complexity of the problem and the investment required to build a product that addressed the needs of larger development teams and enterprises. That’s where we saw an opportunity to start LaunchDarkly.


LaunchDarkly Platform


We're currently serving over 4 billion feature flag requests per day for companies like Microsoft, Atlassian, Ten-X, and CircleCI. Many of our customers report that we’ve changed the way they do development-- we de-risk new feature launches, eliminate the need for painful long-lived branches, and empower product managers, QA, and others to use feature flags to improve their users’ experience.

General Architecture

You can think of LaunchDarkly as being split up into three pieces: a monolithic web application, a streaming API that serves feature flags, and an analytics processing pipeline that's structured as a set of microservices. We've written almost all of this in Go.

Go has really worked well for us. We love that our services compile from scratch in seconds, and produce small statically linked binaries that can be deployed easily and run in a small footprint. I'd done a lot with Scala at Atlassian, but I'd grown frustrated with the slow compilation times and overhead of the JVM. Our monolith has about a 6MB memory footprint— try that on the JVM!

I'm generally not a fan of large web frameworks like Django or Rails. Too much "magic" for me. I prefer to build on top of smaller libraries that serve specific needs. To that end, both our monolith and our microservices rely heavily on a home-built framework layer that uses libraries like Gorilla Mux.

Our framework makes it trivial to add a new resource to our REST API and get a ton of essential functionality out of the box-- with a few lines of code, you get authentication, APM with New Relic, metrics pumped to Graphite, CORS support, and more.

The web application monolith has a pretty standard architecture. Some of the technologies we use include:

  • MongoDB -- as our core application data store. It's popular to make fun of Mongo these days, but we've found it to be a great database technology as long as you don't store too many things in it. Anything you can count on your fingers and toes should be fine.
  • ElasticSearch -- handles user search and segmentation.
  • Redis -- caching, of course.
  • HAProxy -- as a load balancer.


LaunchDarkly Architecture


Serving feature flags, fast

One of the cool and novel parts of LaunchDarkly is our streaming architecture, which allows us to serve feature flag changes instantly. Think of it like a real-time, in-memory database containing feature flag settings. The closest comparison would be something like Firebase, except Firebase is really more focused on the client-side web and mobile, whereas we do that and the server-side.

We use several technologies to drive our streaming API. The most important is Pushpin / Fanout. These technologies abstract us away from managing these long-lived streaming connections and focus on building simple REST APIs.

We also use Fastly as a CDN. Fastly is perfect for us-- we can use VCL to write custom caching rules, and can purge content in milliseconds. If you're caching dynamic content (as opposed to say cat GIFs), or you find yourself needing to purge content programmatically, or you want the flexibility of Varnish in addition to the global network of POPs a CDN can provide, Fastly is the best choice out there. Their support team is also fantastic.

When assembled together, these technologies allow our customers to change their feature flag settings on our dashboard and have their new rollout settings streamed to thousands of servers in a hundred milliseconds or less.

Analytics at scale

The other huge component of LaunchDarkly is our analytics processing pipeline. Our customers request over 4 billion feature flags per day, and we use analytics data from these requests to power a lot of the features in our product. A/B testing is an obvious example, but we also do things like determine when a feature flag has stopped being requested, so that you can manage technical debt and clean up old flags.

Our current pipeline involves an HTTP microservice that writes analytics data to DynamoDB. If we need to do any further processing (say, for A/B testing), then we enqueue another job into SQS. Another microservice reads jobs off of the SQS queue and processes them. Right now, we're actively evolving this pipeline. We've found that when we're under heavy load, we need to buffer calls to DynamoDB while we expand capacity instead of trying to process them immediately. Kafka is perfect for this-- so we're splitting that HTTP microservice into a smaller HTTP service that simply queues events to Kafka, and another service that processes Kafka queues.

We actually use LaunchDarkly to control this evolution. We have a feature flag that controls whether a request goes through our old analytics pipeline, or the new Kafka-based pipeline we're rolling out. Once the new pipeline is enabled for all customers, we can clean up the code and switch over completely to the Kafka pipeline. This is a use case that surprises a lot of customers-- they think of feature flags in terms of controlling user-visible features (release toggles), but they are extremely valuable for other use cases like ops toggles, experiments, and permission management.

LaunchDarkly Platform

As we scaled this service out to handle tens of thousands of request per second, we learned an important lesson about microservice construction. When we first built many of these services, we thought in terms of building a separate service per concern. For example, we’d build a service that would read in analytics events and serve the autocomplete functionality on the site. The web application would make a sub-request to this service when it had an autocomplete request from the site.

We quickly learned that the need for fault tolerance and isolation trumps the conceptual neatness of having a service per concern. With fault tolerance in mind, we sliced our services along a different axis-- separating high-throughput analytics writes from the lower-volume read requests coming from the site. This shift dramatically improved the performance of our site, as well as our ability to evolve and scale the huge write load we see on the analytics side.

Infrastructure

As you might have inferred, we use AWS as our hosting provider. We’re fairly conservative when it comes to adopting new technologies-- deployment for us consists of a set of Ansible scripts that spin up EC2 boxes for our various services. We don’t yet use ECS or Docker containers-- which by extension means we don’t use anything for container orchestration. A long while back, we spiked a migration to Mesosphere but we ran into enough issues that we didn’t proceed forward. We do think that these technologies are the future, but that future is not now, at least for us.

So maturity is one issue that prevents us from adopting some of the latest whiz-bang ops technology. There are other technologies that we find interesting, like Amazon’s API Gateway but the pricing models just don’t work for us-- at tens of thousands of requests per second, they’re non-starters.

Other services

For customer communications and support, we use Intercom, Slack, and GrooveHQ. We also recently started using elevio, and we've found it's a great way to turn Intercom questions into trackable support tickets.

We use ReadMe.io for our product and developer API documentation, GitHub holds all our code hostage, and CircleCI helps us integrate continuously.

What’s next?

We’re constantly evolving our service to improve efficiency and scale. Besides the Kafka switchover, we’re looking at using Cassandra for some of the work that DynamoDB is doing right now. We also are keenly interested in Disque as a queuing solution, especially because we’ve had so much positive experience with Redis.

More aspirationally, we might try spiking some of our new services in Rust. I’m a functional programmer at heart, and while I am appreciative of the speed and tooling around Go, it would be nice to regain some of the expressiveness and elegance of a functional language while retaining what we like about Go (the fast compilation times, ease of deployment). If we do try it out, we’ll do so in a cautious manner, and isolate the trial to a new microservice somewhere.

LaunchDarkly
Serving over 200 billion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.
Tools mentioned in article
Open jobs at LaunchDarkly
Developer Advocate
Oakland, CA
As the market leader of a fast-growing space, we’re looking for a Developer Advocate to help us define and quickly expand the market for Feature Management. The top-level goal of the Developer Advocate is help others be successful with our product and become passionate supporters of our technology. This is a technical role with the mission of engaging, primarily through written content, with the broad community of developers and driving excitement around developer related technologies. This position is a great opportunity to help improve awareness of LaunchDarkly and to increase usage of LaunchDarkly’s technologies through marketing programs as well as in-depth engagement with key accounts.  LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and GitHub, and you'll have an immediate impact with our product and customers.
  • You will collaborate with development and marketing to build use cases and solutions on top of our platform with the primary output being written content for the purpose of making customers successful.
  • You will write about technology trends with the goal of engaging developers, developer managers, and senior technical leaders.
  • You will write about best practices for feature management and modern application architecture with the the goal of influencing customer success.
  • Develop a persona as a thought leader in Feature Management and modern application development trends.
  • You love to build apps, craft solutions, interact with developers and operators to help them learn through the articulation of your experience.
  • You have passion, curiosity, technical depth, and extraordinary written communication skills.
  • You are able to code and build to established integrations (API's/SDK's) with the goal of assembling end-to-end solutions from a collection of parts.
  • You are able to converse with a broad range of programming language communities (Java, .NET, Node.js, Python, Ruby, iOS, Android, etc.), and have a real passion for modern application development trends at the intersection of development and operations. 
  • After building a thing, writing about a thing, you enjoy publicly speaking about the thing.
  • SDK Engineer
    Oakland, CA
    LaunchDarkly is looking for an SDK engineer to help build our client‐side platform support. The ideal candidate has a wealth of experience using different technologies and libraries in the JavaScript ecosystem. This role would include direct contributions to LaunchDarkly's SDKs for JavaScript, Node.js, Electron, React, and React Native. Understanding our space and our customers (we build tools for developers) is critical, but previous experience building for developers isn't a necessary prerequisite— as long as you're willing to learn. LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and Twitter, and you'll have an immediate impact with our product and customers.
  • Contribute to SDK development for our supported platforms such as JavaScript, Node.js, Electron, React, and React Native
  • Be a front-line responder for issues filed by customers
  • Work directly with our CTO and development team to define our architecture, and help define our client‐server networking model
  • Fluency in at least two JavaScript-related technologies (JS, TypeScript, React, etc.) and build tools
  • A strong interest in following trends in the JavaScript ecosystem
  • Strong understanding of the HTTP protocol and networking technologies
  • Experience contributing to open-source software
  • Proven ability to mentor and provide technical leadership
  • Self‐starter and problem solver, willing to solve difficult problems and work independently when necessary
  • Testing background: experience building unit, integration, load tests, and benchmarks
  • Distributed Systems Engineer
    Oakland, CA
    We're looking for a distributed systems engineer to help us build, scale, and maintain LaunchDarkly's real-time data analytics pipeline. You'll be building systems that handle the scale and exponential growth of our product, ingesting, analyzing, and querying hundreds of billions of events per day. LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and Twitter, and you'll have an immediate impact with our product and customers.
  • Help build and maintain our distributed, high-throughput, real-time data analytics pipeline, implemented as a set of Go microservices
  • Use open-source tools like ElasticSearch, Kafka, Redis, and Cassandra
  • Improve the reliability and efficiency of fault-tolerant distributed systems
  • Work directly with our CTO and development team to define and evolve our architecture 
  • 4+ experience building and maintaining large-scale production systems
  • Experience with real-time event logging, stats collection, and analysis
  • Strong understanding of networking technologies, plus practical experience dealing with networking issues in real-world environments
  • Self‐starter and problem solver, willing to solve difficult problems and work independently when necessary
  • Strong testing background: experience building unit, integration, load tests, and benchmarks

  • DevOps Engineer
    Oakland, CA
    As a DevOps Engineer, you will help us maintain and scale LaunchDarkly's engineering infrastructure. In addition to our SaaS offering, you will deliver private instances of the LaunchDarkly service for our enterprise customers. You are passionate about system reliability, performance, and security, with an eye toward taking our operations to the next level (from semi-automated to fully automated). LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and Twitter, and you'll have an immediate impact with our product and customers.
  • Deploy and maintain infrastructure hosted in the cloud
  • Research and implement changes to increase site reliability and help us operate more efficiently
  • Participate in an after-hours on-call rotation
  • Practice sustainable incident response and blameless postmortems
  • Work directly with our CTO and development team to refine our architecture
  • You are an effective communicator
  • You are a self‐starter and problem solver, willing to solve hard problems and work independently when necessary. You identify potential problems and nip them in the bud before they surface
  • You play well on a small, tight-knit team
  • You have run large-scale production systems on Linux servers in Amazon Web Services (AWS)
  • You love automating deployment with configuration management tools such as Ansible, Chef, Puppet, Salt, or Terraform. When you want to automate other processes, you reach for Python or bash
  • You have configured and tuned Web proxy servers such as HAProxy, nginx, Apache httpd, or Varnish
  • You can't live without monitoring systems such as Sensu, Nagios, Graphite/Grafana, or Datadog
  • You are familiar with running systems with a microservice-based architecture
  • You have interacted with data persistence technologies such as Elasticsearch, MongoDB, Cassandra, Kafka, or Redis
  • You have written software in Go (Golang), C++, or Java
  • Verified by
    Director Marketing
    VP of Product and Engineering
    You may also like