Nine Experimentation Best Practices

683
LaunchDarkly
Serving over 200 billion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.

This post is by Dawn Parzych of LaunchDarkly

Running experiments helps you bridge gaps between technical and business teams as you learn about user engagement patterns and application behavior. Experiments can validate that your development teams’ efforts align with your business objectives. You can configure an experiment from any feature flag (front-end, back-end, or operational), but what makes a good experiment?

Experiments use metrics to validate or disprove gut feelings about whether a new feature is meeting customers’ expectations. With an experiment, you can create and test a hypothesis regarding any aspect of your feature development process.

  • Will adding white space to the page result in people spending more time on the site?
  • Do alternate images lead to increased sales?
  • Will adding sorting to the page significantly increase load times?

Experiments provide concrete reporting and measurements to ensure that you are launching the best version of a feature that positively impacts company metrics.

General experimentation best practices

Let’s first talk about best practices around creating an experiment. Even with a solid foundation of feature flagging in your organization, missteps in these areas can yield flawed results. Consider these best practices for experimentation.

1. Create a culture of experimentation

Experiments can help you prove or disprove a hypothesis, but only if you are willing to trust the outcome and not try to game the experiment. Creating a culture of experimentation means:

  • People feel safe asking questions and questioning the answer. Sometimes the results of an experiment may not be what you were expecting. It’s ok to question the results and explore anomalies.
  • Ideas are solicited from all team members—business stakeholders, data analysts, product managers, and developers.
  • A data-driven approach is used to put metrics first, not as something that is only useful in dashboards after a feature ships.

Part of creating a culture of experimentation is providing the tools and training to allow teams to test and validate their features.

Tools needed for experimentation:

  • Tools to help you collect relevant metrics can include monitoring, observability, and business analytics tools.
  • Tools to help you segment users.
  • Tools to clean and analyze the results.

2. Define what success looks like as a team

Experiments help you determine when a feature is good enough to release. How do you define what “good enough” is, and who is involved in creating the definition? Experiments can involve a cross-functional team of people or only a handful, depending on the focus of the experiment. If an experiment is on whether to add a “free trial” button to the home page, you may need to involve people from demand generation, design, and business development.

A good experiment requires well-defined and agreed-upon goals and metrics by all stakeholders. Ask yourself, “What does success look like?” Success means improving a specific metric by a specific amount. “Waiting to see what happens” is not a goal befitting a true experiment. Goals need to be concrete and avoid ambiguous words. Tie goals and metrics to business objectives like increasing revenue, time spent on a page, or growing the subscription base.

Examples of poor goals include vague and ambiguous statements:

  • Users will be happier with the home page.
  • The response time of search results will improve.

Examples of better goals include concrete statistics:

  • Paid conversions from a trial will improve by 7%.
  • The response time of search results will decrease by 450ms.

Looking at a single metric is good; looking at related metrics is even better. Identify complementary metrics and examine how they behave. If you get more new subscribers—that’s great, but you also want to look at the total number of subscribers. Say you get additional subscribers, but it inadvertently causes existing subscribers to cancel. Your total number of subscribers may be down as a result, does that make the experiment a success? I would say no.

3. Statistical significance

The number of samples will depend on your weekly or monthly active users and your application. No matter the size of the samples, it is important to maintain a relatively equal sample size. Significantly more samples in one cohort can skew the results.

Think about how and when users access your application when starting an experiment. If users primarily use your application on the weekends, experiments should include those days. If users visit a site multiple times over the course of a couple of weeks before converting and before the experiment starts, early results may be skewed, showing a positive effect when it was neutral or negative.

4. Proper segmentation

There are two aspects to segmentation. First, how will you segment your users, and second, how will you segment the data? A successful experiment needs two or more cohorts. How you segment your users will vary based on your business and users. Some ways to segment users:

  • Logged in vs. anonymous
  • By company
  • By geography
  • Randomly

However you segment users, make sure the sample sizes are balanced. Having skewed cohorts will result in skewed results.

When analyzing the data after an experiment, you may want to do additional segmentation to see if the results vary based on other parameters.

In AirBnB’s experimentation blog, they described an experiment where they received neutral results when they were expecting to see an uplift. Analyzing the results per browser, they uncovered a bug in IE that skewed the results. When the bug was fixed, they saw the improvement they expected. If there was not a culture of experimentation where it was ok to question the result and look for answers, this feature might not have rolled out.

Caution: don’t go digging to find data to support your hypothesis. The segmentation should be done to understand anomalies better, not make a case to call the experiment a success.

5. Recognize your biases

And this brings us to the subject of biases. Everybody is biased. This isn’t a bad thing; it is a matter of how our brains work. Understanding your biases and the biases of others around you will help when running experiments. Our biases influence how we process information, make decisions, and form judgments.

Some biases that may creep in:

  • Anchoring bias – our tendency to rely on the first piece of information we receive.
  • Confirmation bias – searching for information that confirms our beliefs, preconceptions, or hypotheses.
  • Default effect – the tendency to favor the default option when presented with choices.
  • And my personal favorite, the bias blind spot – our belief that we are less biased than others.

It is common in an experiment to scrub the data to remove outliers. This is acceptable, but make sure you aren’t eliminating outliers due to bias. Don’t invalidate the results of your experiment by removing data points that don’t fit the story.

6. Conduct a retro

Experiments are about learning. So after every experiment, hold a retrospective. Ask:

  • Was the experiment a success? Did we get the results we expected? Why or why not?
  • What questions were raised by the experiment? What questions were answered?
  • What did we learn?

And most importantly, should we launch the feature? You can still decide to launch a feature if the results of an experiment were neutral.

Experimentation feature flags best practices

Now that we’ve covered the basics of experiment best practices, let’s dive into best practices around using feature flags in your experiment. Well done feature flagging is a foundation for a well-run experiment.

If you’ve embraced feature management as part of your teams’ DNA, any group that wants to run an experiment can quickly do so without involving engineering. Follow these best practices to get the most from your existing feature flags.

7. Consider experiments during the planning phase

The decision of whether to wrap a feature in a flag begins during the planning phase. This is also the right time to think about experiments. When creating a feature, evaluate the metrics that will indicate whether the feature is a success—clicks, page load time, registrations, sales, etc.

Talk with the various stakeholders to determine whether an experiment may be necessary and provide them with the essential information on the flag, such as the name of the flag, to configure an experiment.

8. Empower others

When giving others the ability to run experiments, make sure you have the proper controls in place to avoid flags accidentally being changed or deleted. Empower other teams to create targeting rules, define segments, and configure roll-out percentages while preventing them from toggling or deleting flags.

9. Avoid technical debt

Experimentation flags should be short-lived. This means removing a flag after an experiment completes and is either rolled out to 100% of users or not released. Put processes and reminders in place to remove the flag. We highlighted some creative ways to do this in our short-term and permanent best practices blog.

To get started with your own experiments, check out the LaunchDarkly documentation.

Are you looking for other best practices? Check out the other blog posts in the series.

Long-term and short-term flags best practices

Operational flags best practices

Release management flags best practices

LaunchDarkly
Serving over 200 billion feature flags daily to help software teams build better software, faster. LaunchDarkly helps eliminate risk for developers and operations teams from the software development cycle.
Tools mentioned in article
Open jobs at LaunchDarkly
Developer Advocate
San Francisco
As the market leader of a fast-growing space, we’re looking for a Developer Advocate to help us define and quickly expand the market for feature flagging. The overall mission of the Developer Advocate is to secure platform adoption and revenue growth through evangelism, community engagement, and developer relations. This is a technical role with the mission of engaging with the broad community of developers and driving excitement around developer related technologies. This position is a great opportunity to help improve awareness of LaunchDarkly and to increase usage of LaunchDarkly’s technologies through marketing programs as well as in-depth engagement with key accounts. You would be the first at this role, and must be excited about getting to define a new role.
  • Develop useful content, education, and demo apps on top of our platform to demonstrate value and build excitement.
  • Talk about technology intelligently and enthusiastically to developers, developer managers and senior management.
  • Develop relationships with influencers and third-party communities.
  • Attend and speak at conferences, user meetups and hackathons to connect with developers and understand how we can best serve them and make them successful.
  • Become a thought leader in the market.
  • Be a voice of our users inside LaunchDarkly.
  • Success in this role is measured by the growth and retention of LaunchDarkly customers.
  • You have unending enthusiasm to share your knowledge and ideas with other developers.
  • You are able to converse with a broad range of developer technologies and communities (Java, .NET, Node.js, Python, Ruby on Rails, iOS, Android, etc.), but have a particular interest in the DevOps and continuous delivery communities.
  • You have passion, curiosity, technical depth, and exceptional communication and presentation skills.
  • You have a genuine interest in helping developers solve their problems.
  • You are involved in developer community groups.
  • You have good marketing skills and business logic.
  • You possess a strong software developer background, write code and share what you know.
  • You love to build apps, create solutions, interact with other developers and derive job satisfaction from helping others learn by doing.
  • You are interested in implementing marketing programs that are scalable and repeatable.
  • Ruby SDK Engineer
    Oakland
    LaunchDarkly is looking for a Ruby SDK engineer to help build our server-side platform support. You'll own our Ruby SDK, and contribute to our SDKs for other platforms. The ideal candidate has experience developing SDKs embedded in external web applications. Understanding our space and our customers (we build tools for developers) is critical, but previous experience building for developers isn't a necessary prerequisite— as long as you're willing to learn. A deep understanding of networking technologies (TCP, HTTP, websockets, server-sent events, etc.), plus battlefield experience with networking in cloud environments is essential. You'll play a pivotal role in defining the architecture of our server-side SDKs in Ruby and other platforms. 
  • Be the primary contributor for our Ruby SDK (https://github.com/launchdarkly/ruby-client)
  • Contribute to our other supported platforms (Python, Go, Node.js, JavaScript, Java, .NET)
  • Work directly with our CTO and development team to define our architecture, and help define our client-server networking model
  • End-to-end understanding of the Ruby / Rails ecosystem— including APIs, libraries, concurrency, packaging, release and dependency management.
  • Strong understanding of networking technologies, plus practical experience dealing with networking issues in mobile environments
  • Polyglot background— fluency in at least one other programming language, and an ability to context switch quickly between languages
  • Strong understanding of concurrency / threading in Ruby and other languages
  • Proven ability to mentor and provide technical leadership
  • Self-starter and problem solver, willing to solve difficult problems and work independently when necessary
  • Strong testing background: experience building unit, integration, load tests, and benchmarks
  • Distributed Systems Engineer
    Oakland
    LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and Twitter, and you'll have an immediate impact with our product and customers. We're looking for a distributed systems engineer to help us build, scale, and maintain LaunchDarkly's real-time data analytics pipeline. You'll be building systems that handle the scale and exponential growth of our product, injesting, analyzing, and querying billions of events per day.
  • Help build and maintain our distributed, high-throughput, real-time data analytics pipeline, implemented as a set of Go microservices
  • Use open-source tools like ElasticSearch, Kafka, Redis, and Cassandra
  • Improve the reliability and efficiency of fault-tolerant distributed systems
  • Work directly with our CTO and development team to define and evolve our architecture 
  • Experience building and maintaining large-scale production systems
  • Experience with real-time event logging, stats collection, and analysis
  • Strong understanding of networking technologies, plus practical experience dealing with networking issues in real-world environments
  • Self‐starter and problem solver, willing to solve difficult problems and work independently when necessary
  • Strong testing background: experience building unit, integration, load tests, and benchmarks

  • SDK Engineer
    Oakland
    LaunchDarkly is a rapidly growing software company with a strong mission and vision carried out by a talented and diverse team of employees. Our goal is to help teams build better software, faster. You'll join a small team from companies like Atlassian, Intercom, and Twitter, and you'll have an immediate impact with our product and customers. LaunchDarkly is looking for an SDK engineer to help build our server‐side platform support. The ideal candidate has experience developing SDKs embedded in external web applications. This means understanding the complexity and restrictions of releasing and maintaining software embedded in customer applications. Engineers with a low-level understanding of multiple platforms (e.g. the JVM, Ruby, and Node.js), performance and load testing, and a focus on releasing and maintaining high-quality software will thrive in this role. Understanding our space and our customers (we build tools for developers) is critical, but previous experience building for developers isn't a necessary prerequisite— as long as you're willing to learn.
  • Contribute to SDK development for our supported platforms (Ruby, Python, Go, Node.js, JavaScript, Java, .NET)
  • Be a front-line responder for issues filed by customers
  • Work directly with our CTO and development team to define our architecture, and help define our client‐server networking model
  • Strong understanding of networking technologies, plus practical experience dealing with networking issues in mobile environments
  • Experience contributing to open-source software
  • Polyglot background— fluency in at least one other programming language, and an ability to context switch quickly between languages
  • Strong understanding of concurrency and threading on multiple platforms 
  • Proven ability to mentor and provide technical leadership
  • Self‐starter and problem solver, willing to solve difficult problems and work independently when necessary
  • Testing background: experience building unit, integration, load tests, and benchmarks
  • Verified by
    VP of Product and Engineering
    Director Marketing
    You may also like
    Rust at OneSignal
    How to Practically Use Performance API to Measure Performance
    JavaScript Errors: An Exceptional History
    Rethinking Front-end Error Reporting