Scaling Clearbit to 2M API Requests Per Day

APIs for determining who's behind an email address

By Harlow Ward, ‎Developer and Co-founder at Clearbit.

Clearbit builds Business Intelligence APIs - Our suite of APIs are focused on Lead Enrichment and Automated Research.

Clearbit lookup example

Our goal is to help modern businesses make better data-driven decisions. Our platform aggregates data from hundreds of public sources and packages it up into beautifully hand-crafted JSON payloads.

Customers use our APIs to:

  • Give their sales team more information on customers, leads, and prospects.
  • Integrate and surface person/company data to the end-users of their systems.
  • Underwrite transactions and reduce fraud.

Outside of our paid products we also love releasing free products. These bite sized APIs are hyper focused on helping designers and developers enhance the user-experience of their tools and systems.

A few of these freebies include:

Engineering at Clearbit

Our engineering team consists of three developers: Alex MacCaw (also our fearless CEO), Rob Holland, and myself.

We are a small dev team, and that means we all wear a lot of hats. Day-to-day, it’s not uncommon to jump between Frontend HTML/JS/CSS, API design, Service administration, DB administration, Infrastructure management, and of course a little customer support.

Services Everywhere

We made the decision early on to build a microservice-first architecture. This means our system is composed of lots of tiny Single Responsibility Services (SRS anyone?).

In general these services are written in Ruby, leverage Sinatra to expose JSON endpoints, and use RSpec to verify accuracy. Each service maintains its own datastore; depending on the service's needs we’ll typically choose from Amazon RDS, Amazon DynamoDB, or hosted Elasticsearch with Found.

There are some great arguments to be made about a MonolithFirst architecture. However, in our case, we felt our data boundaries were reasonably clear from the beginning, and this allowed us to make a few low-risk bets around building and running a microservice-first architecture. So far so good!

Our web services fall into two categories:

  1. External (publicly accessible, authenticated via API keys).
  2. Internal (accessible within VPC, locked down to specific security groups).

At any given time we’re running 70+ different internal services across a cluster of 18 machines. Our external (customer facing) APIs are serving upwards of 2 million requests per-day, and that number is rapidly increasing.

Early Days

When working with a microservice architecture it's difficult to overstate how important it is for a developer to be able to quickly push a new web service.

Our initial aritecture was built on Amazon EC2 and leveraged dokku-alt (a Docker powered mini-Heroku) to manage deployments.

Dokku-alt covered our basic requirements:

  • Git based deploys.
  • Managing ENV vars outside of config files.
  • Ability to rollback in case of emergency.

However, as the number of servers grew some shortcomings of dokku-alt began to emerge. This was no fault of dokku-alt; we were just outgrowing our architecture.

As we added more machines the problems compounded. The per-machine configuration management we had initially loved quickly became unsustainable. On top of that, running git push production master simultaneously to every box in the cluster made for some nerve-racking deploys.

The state of our deployment system was beginning to take a toll on the team's productivity. It was time to make a change. We collectively decided to explore our options.

Current Stack

As our infrastructure grew, our deployment requirements also evolved:

  • Distributed configuration management.
  • Git push to only one repository.
  • Blue/Green style deploys.

After looking into solutions like Deis and Flynn, we decided we'd feel happier with something with simpler semantics. We were attracted to Fleet because of it's simplicity and flexibility, and the reputation of the CoreOS team.

Co-ordinating configuration between machines became a breeze with the use of etcd. Now when our deployer app builds a new docker container we can inject environment variables from etcd directly into the container.

From there, we use Fleet to distribute the units accross our cluster of servers. We’ve found fleet-ui super handy for visualizing the distribution of units across our cluster.


To keep our operational expenses down, we have a static pool of on-demand EC2 instances running the etcd quorum, HAProxy, and several of the HTTP front ends. On top of that, we leverage a dynamic pool of EC2 Spot Instances to handle the dynamic nature of our workloads during times of extremely high throughput.

Word to the wise: Don’t use Spot Instances as part of your etcd quorum -- When someone else bids higher than the current Spot Price (and they will), the Spot Instances will disappear without warning.


It’s hard to stress how important it’s been for us to have a deep and instantly available understanding of the current state of all our services.

Starting from the outside, we use Runscope to continually ping and analyze responses from our services. It’s been instrumental in verifying and maintaining the APIs with dynamic date versioning.

Digging a level deeper, we use Librato for measuring and monitoring lower level system behaviour. We’re diligent about creating alerts that will notify the team if anything seems awry.

Sentry notifies us immediatly via Slack and Email if any of our services are throwing errors. We’re big believers in the Broken windows theory, and try to keep Sentry as clean as possible.

Finally, we use SumoLogic as our log aggregation platform. We run Sumo Collectors on each of our hosts. SumoLogic is our last line of defense for spotting inconsistent system behaviour and debugging historical issues.

Looking Forward

We have a private contrib repo with a handful of rack middlewares that are shared across our services. These middlewares dramatically cut down on duplication of code around Authentication, Authorization, Rate Limiting, and IP Restrictions.

In general, the shared middleware approach has worked well for us. However, as we look to the future and the team continues to experiment with new languages, the Ruby middlewares can’t be shared across new languages in the polyglot system.

Our goal is to push this shared logic out of the services and into the proxy layer (possibly with the help of VulcanD, Kong, or some custom HAProxy foo).

If you have made a transition like this before, or have a an elegant idea of how to summersault this hurdle, I’d love to buy you a beverage.

APIs for determining who's behind an email address
Tools mentioned in article
Open jobs at Clearbit
React & Ruby Software Engineer
San Francisco

As an early engineer at Clearbit, you'll have a huge impact on our product and direction. You'll be working very closely with the rest of our team, creating APIs, building out applications, and managing huge amounts of data.

We value ownership highly—the ability to take an idea through all the stages from conception to shipping a product. This reflects throughout our company, but is especially true in engineering. As an engineer at Clearbit, you'll be highly independent and autonomous. Since we're building such disparate data APIs and products you'll be working with a large array of different technologies and fields. Expect lots of interesting challenges.


  • Mainly work with a Ruby/Sinatra/Sequel/Postgres stack
  • Bring new features from concept to shipped product
  • Come up with new product directions and APIs

We’re looking for someone with:

  • 2+ years of experience in Ruby
  • Independent and self motivated—side projects and published libraries (NPM packages/Rubygems) a major plus
  • Either based in San Francisco, or remote is absolutely fine. Half our engineering team is remote.

Some fun facts about Clearbit:

We power the sales and marketing data for many of the tech companies in the valley. We operate behind the scenes at companies like Asana, Zendesk, Gusto, Stripe, Intercom, and Segment to give their teams the data they need to understand their customers, and do cool things like customize their website (based on visitor IP/company) and shorten signup forms.

We have a mixture of data APIs and consumer apps. Just one of our APIs alone gets over 30 million requests a day, and we have 200k daily users of our consumer apps.

We are a big micro-service shop: ~60 internal services and 6 public facing APIs. Our services are a mixture of Go/Ruby + Sinatra. We use Postgres + DynamoDB + Aurora for persistence. Everything is hosted on EC2 + Kubernetes - about 100 nodes.

We have just over 1k customers, are profitable, and like to keep things efficient and sustainable. We're a small team of 30 - half growth, half engineers. Backgrounds from Twitter, Stripe, Google & ThoughtBot.

We're based in SOMA, San Francisco in a converted coffee warehouse, but half the team is remote. We like to do fun offsites a couple of times a year - previous ones were Costa Rica, Colombia, Tulum.

SysOps Engineer
San Francisco, CA (or Remote)
As an early sysops engineer at Clearbit, you'll have a huge impact on our infrastructure and developer happiness. You'll be working very closely with the rest of our team, managing and scaling our Kubernetes clusters, creating developer tooling and workflows to improve our ability to ship confidently, and ensuring that we remain secure and compliant while managing huge amounts of data. We value ownership highly—the ability to take an idea through all the stages from conception to shipping. This reflects throughout our company, but is especially true in engineering. As an engineer at Clearbit, you'll be highly independent and autonomous. Since we're building such disparate data APIs and products you'll be working with a large array of different technologies and fields. Expect lots of interesting challenges.
  • Manage and improve our growing AWS/Terraform/Kubernetes/istio stack
  • Improve developer workflows and experience through CI/CD and custom tooling
  • Utilize centralized monitoring and logging to improve visibility across the team
  • Help development teams solve scaling issues and bottlenecks
  • Manage AWS services, costs, security and workflows
  • Excellent written/spoken communication
  • Production Kubernetes experience
  • Go/Ruby/Python/similar language experience
  • On-call experience
  • A brief write-up explaining who you are as an engineer. For example, how you got started, what area of the stack you feel most familiar with, what motivates you, what technologies you want to learn over the next year
  • A side project you really enjoyed working on
  • Links to online profiles you use (GitHub, Twitter, etc)
  • A description of your work history (whether as a resume, LinkedIn profile, or prose)
  • Success Engineer
    San Francisco, CA
    Clearbit is a rapidly growing SaaS company (> 250% ARR last year) looking for talented and experienced B2B customer success folks to take us to the next level. We've been profitable since year 1, after raising a small seed round from FirstRound, SV Angel, and Zetta. We often compete (and win) against companies 10x our size, and we're looking to grow rapidly over the next year. If you're looking for a highly collaborative environment with a very experienced team working on awesome cutting edge products, Clearbit is the place. Check out our Glassdoor Reviews
  • Onboard and support multiple Clearbit products (Salesforce and API based)
  • Work with a diverse customer base on an ongoing basis, implementing, troubleshooting (sometimes in real time) and filing bugs as they arise, consulting customers of solutions and best practices along the way
  • Support the Customer Success Management team in deal renewals, cross-sells, etc.
  • Document our solutions are integrated as expected before handing the case back to Customer Success Manager team
  • Scope out and potentially build new integrations
  • 1 to 2 years of Solution/Success Engineering (or equivalent) experience in B2B SaaS
  • Problem-solving ability, customer centric outlook and excellent communication skill
  • Experience working with cross-functional teams (Sales, Marketing, Product)
  • Experience working with Salesforce and Salesforce integrated products
  • Working knowledge of either Salesforce or Javascript
  • Ability to work out of our San Francisco HQ

  • Highly Preferred: Experience working with SQL, Javascript, and APIs
  • Bonus Points if you're familiar with industry standard marketing and sales solutions - Google Analytics, Adobe Analytics, Optimizely, Marketo, Pardot, Hubspot, Eloqua, etc, or have Salesforce Trailhead certification
  • Security and Compliance Engineer
    San Francisco Bay Area
    Security is an important part of Clearbit’s mission. We’re serious about protecting our infrastructure, operations, and most importantly, the data our customers’ entrust us with. As the founding member of Clearbit’s security team, you understand that building user trust is critical to our success. You are passionate about information security risk management, privacy and maintaining customer confidence. You have the focus and organization to build on what we’ve already started and champion the adoption of sound security practices across all of Clearbit’s business and engineering teams. You love learning new legal policy frameworks, building processes to address new regulatory and compliance requirements, and jump at the chance to use your technical knowledge to answer customer questions.
  • Build and maintain a formalized customer inquiries program; including the development of any customer facing documentation and responses regarding Clearbit’s information security, compliance and regulatory programs.

  • Manage and respond to all customer information security or compliance inquiries and audits.

  • Be available as needed to discuss Clearbit’s security program and practices with existing and potential customers.

  • Spearhead and maintain various regulatory and compliance attestation and/or certification programs (including SOC 2).

  • Codify and raise awareness of internal security policies and practices.

  • Improve and maintain the following information security program components:
  • IT Risk methodology & processes, risk assessments and treatment plans
  • Risk & compliance program, documentation, and assessment calendar
  • Security Training & Awareness Program
  • Vendor risk management

  • Collaborate with devops and IT counterparts to improve network and infrastructure security to better secure customer data.

  • Partner with legal and policy counterparts to create policies and artifacts that support compliance programs.
  • Experience managing customer information security, compliance and regulatory inquiries and audits.
  • Experience interacting directly with both enterprise and small business customers.
  • Experience implementing, participating in, or conducting security assessments of compliance programs (e.g.: SOC 2, FedRAMP, ISO 27001, HIPAA, etc.).
  • Ability to work independently, communicating across multiple time zones.
  • Experience working with cross-functional teams and multiple stakeholders with varying levels of technical aptitude.
  • Familiarity with generally-accepted security methods, concepts and techniques.
  • Effective communication with great interpersonal and presentation skills, writing well to translate complex issues into simple language that people who are not experts can understand
  • Bachelor’s degree in computer science or equivalent educational or professional experience and/or qualifications.
  • Thorough understanding of underlying AWS infrastructure components and best practices
  • 2+ years of information security experience
  • 2+ years of experience with information technology audits and assessments
  • Verified by
    Software Engineer
    You may also like
    Error Handling on Android
    Shifting From Monitoring to Observability
    The Growth Stacks of 2019
    How Mixmax Uses Node and Go to Process 250M Events a day