Redux: Scaling LaunchDarkly From 4 to 200 Billion Feature Flags Daily

5,478
LaunchDarkly
Serving over 200 billion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.

Written By John Kodumal, CTO and Co-Founder, LaunchDarkly


Background

LaunchDarkly is a feature management platform—we make it easy for software teams to adopt feature flags, helping them eliminate risk in their software development cycles. When we first wrote about our stack, we served about 4 billion feature flags a day. Last month, we averaged over 200 billion flags daily. To me, that's a mind-boggling number, and a testament to the degree to which we're able to change the way teams do software development. Some additional metrics:

  • Our global P99 flag update latency (the time it takes for a feature flag change on our dashboard to be reflected in your application) is under 500ms
  • Our primary Elasticsearch cluster indexes 175M+ docs / day
  • At daily peak, 1.5 million+ mobile devices and browsers and 500k+ servers are connected to our streaming APIs
  • Our event ingestion pipeline processes 40 billion events per day

We've scaled all our services through a process of gradual evolution, with an occasional bit of punctuated equilibrium. We've never re-written a service from scratch, nor have we ever had to completely re-architect any of our services (we did migrate one service from a SaaS provider to a homegrown; more on that later). In fact, from a high level, our stack is very similar to what we described in our earlier post:

  • A Go monolith that serves our REST API and UI (JS / React)
  • A Go microservice that powers our streaming API
  • An event ingestion / transformation pipeline implemented as a set of Go microservices

We use AWS as our cloud provider, and Fastly as our CDN.

Let's talk about some of the changes we've made to scale these systems.

Buy first, build if necessary

Over the past year, we've shifted our philosophy on managed services and have moved several critical parts of our infrastructure away from self-managed options. The most prominent was our shift away from HAProxy to AWS's managed application load balancers (ALBs). As we scaled, managing our HAProxy fleet became a larger and larger burden. We spent a significant amount of time tuning our configuration files and benchmarking different EC2 instance types to maximize throughput. Emerging needs like DDoS protection and auto scaling turned into large projects that we needed to schedule urgently. Instead of continuing this investment, we chose to shift to managed ALB instances. This was a large project, but it quickly paid for itself as we've nearly eliminated the time spent managing load balancers. We also gained DDoS protection and auto scaling "for free".

As we've evolved or added additional infrastructure to our stack, we've biased towards managed services:

  • Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.
  • We also use managed Elasticache instances instead of spinning up EC2 instances to run Redis workloads.
  • In our previous StackShare article, I wrote about a project to incorporate Kafka into our event ingestion pipeline. In keeping with our shift towards managed services, we shifted to Amazon's Kinesis instead of Kafka.

Managed services do have some drawbacks:

  • They're almost never cheaper (in raw dollars) than self-managed alternatives. Pricing is often more opaque, more variable, and hard to predict
  • Much less visibility into the operation, errors, and availability of the service
  • Vendor lock-in

Still, it's a false economy to measure the raw cost of a managed service to an unmanaged service—factor in your team's time and the math is usually pretty clear.

There is one notable case where we've moved from a managed SaaS solution to a homegrown. LaunchDarkly relies on a novel streaming architecture to push feature flag changes out in near real-time. Our SDKs create persistent outbound HTTPS connections to the LaunchDarkly streaming APIs. When you change a feature flag on your dashboard, that change is pushed out using the server-sent events (SSE) protocol. When we initially built our streaming service, we relied heavily on a third-party service, Fanout, to manage persistent connections. Fanout worked well for us, but over time we found that we could introduce domain-specific performance and cost optimizations if we built a custom service for our use case. We created a Go microservice that manages persistent connections and is heavily optimized for the unique workloads associated with feature flag delivery. We use NATS as a message broker to connect our REST API to a fleet of EC2 instances running this microservice. Each of these instances can manage over 50,000 concurrent SSE connections.

At scale, everything is a tight loop

Some of our analytics services receive tens of thousands of requests per second. One of the biggest things we've learned over the past year is that at this scale, there's almost no such thing as premature optimization. Because of the sheer volume of requests, every handler you write is effectively running in a tight loop. We found that to keep meeting our service level objectives and cost goals at scale, we had to do two things repeatedly:

  1. Profile aggressively to identify and address CPU and memory bottlenecks
  2. Apply a set of micro-patterns to handle specific workload

Profiling must be done periodically, as new bottlenecks will constantly emerge as traffic scales and old bottlenecks are eliminated. As an example, at one point, we found that the "front-door" microservice for our analytics pipeline was CPU-bound parsing JSON. We switched from Go's built-in encoding/json package to easyjson, which uses compile-time specialization to eliminate slow runtime reflection in JSON parsing.

We also identified a set of "micro-patterns" that we have extracted as self-contained libraries so they can be applied in appropriate contexts. Some examples:

  • Read coalescing—In a read-heavy workload, expensive calls to fetch data can be queued to await the first read—a kind of memoization. This pattern is encapsulated in Google's singleflight package
  • Write coalescing—The dual of read coalescing. In a write-heavy workload, where last write wins, writes can be queued and discarded in favor of the latest write attempt.
  • Multi-layer caching—In scenarios where an in-process, in-memory cache is necessary for performance, horizontal scaling can reduce cache hit rates. We make our fleet more resilient to this effect by employing multiple layers of caching—for example, backing an in-memory cache with a shared Redis cache before finally falling back to a slower persistent disk-backed store.

These simple patterns improved performance at scale and also helped us deal with bad traffic patterns like reconnection storms.

Get good at managing change

Scaling up isn't just about improving your services and architecture. It requires equal investment in people, processes and tools. One thing we really focused on the process and tools front is understanding change. Better visibility into changes being made to the service had a massively positive impact on service reliability. Here are a few things we did to improve visibility:

  • Internal changelog service: This service catalogues intentional changes being made to the system. This includes deploys, instance type changes, configuration changes, feature flag changes, and more. Anything that could potentially impact the service (either in a positive or negative way) is catalogued here. We couldn't find anything off the shelf here, so we built something ourselves.
  • COGS (cost of goods sold) log: Very similar to our changelog, but focused on price changes to our services. If we scale out a service, or change instance types, or make reserved instance reservations, we add an entry to this log. For us, this is just a Confluence page.
  • Observability / APM: We use a number of services to gain observability into what is happening to our service at runtime. We use a mix of Graphite / Grafana and Honeycomb.io to give us the observability we need. We're big fans of Honeycomb here.
  • Operational and release feature flags: We feature flag most changes using LaunchDarkly. Most new changes are protected by release flags (short-lived flags that are used to protect the initial rollout and rollback of a feature). We also create operational flags—which are long-lived flags that act as control switches to the application. Observability lets us understand change, and feature flags allow us to react to change to maintain availability or improve user experience.
  • Spinnaker / Armory: LaunchDarkly is almost a five year old company, and our methodology for deploying was state of the art... for 2014. We recently undertook a project to modernize the way we deploy our software, moving from Ansible-based deploy scripts that executed on our local machines, to using Spinnaker (along with Terraform and Packer) as the basis of our deployment system. We've been using Armory's enterprise Spinnaker offering to make this project a reality.

Like the sound of this stack? Learn more about LaunchDarkly.

LaunchDarkly
Serving over 200 billion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.
Tools mentioned in article
Open jobs at LaunchDarkly
Application Engineer
- US

As a SaaSOps Engineer, you are passionate about learning and automation technologies. You will work closely with your IT engineering peers to identify bottlenecks, improve system efficiency by reducing manual tasks through automation and ensure our core systems are designed to scale as our team grows. 

The ideal candidate is an excellent communicator and fast learner who is looking for a supportive environment to continue to grow their technical skills and expertise.

You will be a stakeholder in the discovery, procurement, implementation and management of our growing corporate infrastructure. You have a passion for innovation and creative problem solving, as well as the ability to be adaptable to the constantly changing technology.

LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and GitHub, and you'll have an immediate impact with our product and customers.

What you'll do:

  • Perform day-to-day administrative tasks on SaaS applications within our stack.
  • Handle technical escalations, provide timely resolution of problems or engage lead staff for assistance.
  • Support SSO implementations using Okta.
  • Identify ways to improve efficiency and scale services leveraging tools and automation.
  • Build workflows to enable seamless and secure access to applications that support our internal customers.
  • Help build, scale, and maintain system monitoring and alerting tools
  • Compose scripts in an administrative language (Python, Ruby, or Shell)
  • Customize, create, and package software, scripts, and applications for deployment. 
  • Collaborate with cross functional teams to solve business problems.
  • Create IT systems’ compliance policies and provide compliance proof for audits.
  • Knowledge of SSH, keystores, security certificates, user and password management, Single-Sign-On integration, and authorization tokens.
  • Document and update processes and procedures for supporting applications across the org.
  • Drive tool adoption and proficiency through training and documentation.
  • Support initiatives to continuously improve, integrate, and consolidate systems company-wide

Who you are:

  • Experience with SaaS application management including: Okta, GSuite, Slack and AWS
  • Project management, system implementation, and system integration experience
  • Scripting expertise with GAM, Python, TerraForm, JSON, Perl, Powershell or equivalent tools
  • Proficient with API integrations
  • JAMF administration, package creation, policy scripting
  • Strong analytical and problem-solving skills; ability to work creatively in a problem-solving environment
  • Ability to balance technical skills with business savvy – ability to transform end-user needs into technical and functional requirements

You may also know: (Bonus Skills)

  • Experience with Terraform, Docker, Container Management, and all things AWS

About LaunchDarkly:

LaunchDarkly is a Feature Management Platform that serves hundreds of billions of feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka "dark launching"). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive. Our headquarters are in Oakland.

At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.

Don't let the confidence gap get in the way of applying! We'd love to hear from you.

We've partnered with KeyValues to help demonstrate the amazing culture we've built here at LaunchDarkly, find more info at https://www.keyvalues.com/launchdarkly

Backend Engineer
- US
As a Backend Engineer, you will help us build features, design and implement API methods, and improve the performance and reliability of our systems. We're looking for someone who knows what it takes to deliver value to customers and takes pride in the quality of their work.

Our platform serves over twenty billion feature flags daily. The core technologies we use daily include Golang, MongoDB, ElasticSearch, Redis, and NATS. As part of our you-build-it-you-run-it culture, all developers may be responsible of support applications in production, including on-call. On call is compensated in addition.

What you'll get to do: 

  • Build and expand our APIs and services, written in Go
  • Collaborate with frontend engineers to deliver user-facing features
  • Monitor and improve server-side performance
  • Write unit, integration, and load tests as necessary
  • Actively participate in code reviews
  • Write and review technical proposals
  • Improve engineering standards, tooling, and processes
On day one, you should have:
  • Proven experience and fluency with server-side web development (e.g. in Java / Scala, Ruby, Python, Golang, Node.js)
  • Experience building RESTful APIs
  • Strong computer science fundamentals: data structures, distributed systems, concurrency, and threading
  • Strong communication skills, a positive attitude, and empathy
  • You write code that can be easily understood by others, with an eye towards maintainability
  • You hold yourself and others to a high bar when working with production systems
  • You value high code quality, automated testing, and other engineering best practices

Bonus Points:

  • Experience with NoSQL databases (MongoDB, ElasticSearch)
  • A deep understanding of networking technologies (TCP, HTTP, websockets, server-sent events, etc.)

About LaunchDarkly:

LaunchDarkly is a Feature Management Platform that serves hundreds of billions of feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka "dark launching"). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive. Our headquarters are in Oakland.

At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.

Don't let the confidence gap get in the way of applying! We'd love to hear from you.

We've partnered with KeyValues to help demonstrate the amazing culture we've built here at LaunchDarkly, find more info at https://www.keyvalues.com/launchdarkly

 

Demo Engineer
Oakland, California, United States

As a Demo Engineer, you will be in a unique position to build the foundation and strategy for how we visually articulate our technology to our prospects and customers. You will be responsible for building and maintaining the demo environments that the Revenue organization uses during a deal cycle and will directly impact every single opportunity the team works on. You will be working with the latest technologies in the space, allowing you to use your creativity to craft compelling storylines that speak to the value proposition of our solution. Collaboration will be key in this role and will require you to work closely with our amazing SEs, PMs, Engineers, and Marketing to ensure that we are building a strong and relevant message.

LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. 

Software powers the world and LaunchDarkly empowers all teams to deliver and control their software.

Responsibilities:

  • Solicit feedback from SEs, Product, Marketing, and our customers to initially understand the current state of our demo product & process, and continuously drive this feedback loop to ensure we’re constantly adapting to our changing product and market
  • Own developing the strategy for all things demo-related and driving priorities within that strategy to align to the highest impact gaps
  • Become a LaunchDarkly expert understanding our SDKs and platform in details, as well as the growing number of integrations to tell a rich end to end story around feature management
  • Collaborate closely with Product Managers as we release new product features to guarantee that our demo experience highlights new functionality in LaunchDarkly's platform as it is available
  • Write and maintain documentation on how the demo environments work in-depth and how to use it

Basic Qualifications:

  • 2+ years of experience in a demo org, technically oriented, or pre-sales role: Demo Engineer, Solutions Engineer, Solutions Consultant, or similar experience in an engineering role
  • Programming experience required, preferably in Java, .NET. GO and/or python
  • Ability to learn and synthesize large amounts of information with little context
  • A self‐starter and problem solver, willing to take on hard problems and work independently when necessary.
  • Superior communications skills (presentation, written, and verbal) and demonstrated ability to communicate/present/motivate effectively across technical and selling teams
  • Technical proficiency creating and maintaining demo environment

Preferred Qualifications:

  • Experience working with teams that underwent development process transformation
  • Experience with data persistence technologies like Varnish or Redis
  • Full-stack and mobile development experience

About LaunchDarkly:

LaunchDarkly is a Feature Management Platform that serves hundreds of billions of feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka "dark launching"). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive. Our headquarters are in Oakland.

At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.

Don't let the confidence gap get in the way of applying! We'd love to hear from you.

We've partnered with KeyValues to help demonstrate the amazing culture we've built here at LaunchDarkly, find more info at https://www.keyvalues.com/launchdarkly.

Developer Advocate
Oakland, California, United States

As a Developer Advocate, you will join our Developer Marketing team, writing, speaking, and creating other forms of content to help expand the awareness of Feature Management. Our team helps make developers more successful by improving their software development practices. The content we provide covers both technical as well as human-centered topics. We are looking to expand our team with a Developer Advocate whose primary focus is writing but does not hesitate to take the stage (virtually or eventually in-person).

Responsibilities:

  • Write, speak, and create content about technology trends with the goal of engaging our developers, developer managers, and senior technical leaders.
  • You will drive awareness about best practices for feature management and modern application architecture with the goal of influencing customer success.
  • Develop a persona as a trusted advisor in Feature Management and modern application development trends.

About You:

  • You have passion, curiosity, technical depth, and extraordinary written and verbal communication skills.
  • You are a creative and skilled storyteller.
  • You are able to converse with a broad range of programming language communities (Java, .NET, Node.js, Python, Ruby, iOS, Android, etc.), and have a real passion for modern application development trends at the intersection of development and operations.
  • You understand that the hardest parts about DevOps practices revolve around the people, not the process or technology.
  • You manage multiple deadlines and projects independently.
  • You have publicly available writing samples and/or recordings of presentations.

Bonus points if you also:

  • Experience in the DevOps space
  • Interest in producing short videos for a technically-savvy audience.

Specific Requirement:

  • 25% travel required (when travel is a thing again)
  • Must be authorized to work in the country you reside in (no visa sponsorship / relocation)
  • Must have published writing samples
  • Minimum 5 years work experience

About LaunchDarkly:

LaunchDarkly is a Feature Management Platform that serves hundreds of billions of feature flags daily to help software teams build better software, faster. Feature flagging is an industry standard methodology of wrapping a new or risky section of code or infrastructure change with a flag. Each flag can easily be turned off independent of code deployment (aka "dark launching"). LaunchDarkly has SDKs for all major web and mobile platforms. We are building a diverse team so that we can offer robust products and services. Our team culture is dynamic, friendly, and supportive. Our headquarters are in Oakland.

At LaunchDarkly, we believe in the power of teams. We're building a team that is humble, open, collaborative, respectful and kind. We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.

Don't let the confidence gap get in the way of applying! We'd love to hear from you.

We've partnered with KeyValues to help demonstrate the amazing culture we've built here at LaunchDarkly, find more info at https://www.keyvalues.com/launchdarkly.

Verified by
Special Circumstances
Head of Ecosystem Partnerships
Demand Program Manager
Engineering Lead
Director Marketing
VP of Product and Engineering
You may also like