How Sendwithus Sent Their First Billion Emails

Editor's note: By Brad Van Vugt, ‎Cofounder at Sendwithus

What is Sendwithus?

Sendwithus is a platform for sending templated emails. We make it easy for product teams to manage and track automated emails being sent to their customers.

awesometemplate management with sendwithus

Customers use Sendwithus for

Welcome and Onboarding Emails
Notifications and Invitations
Product Updates
Invoices
Password Resets
etc

Typically automated emails are stored in source code. Making changes requires a ridiculous amount of time and effort, or in most cases, isn’t even possible. This is particularly true for larger organizations with multiple teams.

Our goal is to make automated emails collaborative and easy to manage. Instead of hardcoding email templates in HTML files, developers integrate with our API and use Sendwithus to store, send, manage, and update templates in real-time. Customers can view detailed analytics, run A/B tests, schedule drip campaigns, and manage localized content without ever updating code.

Sendwithus-cli in action

Today we power more than 10M emails each day, and process over 100M API requests between our customers and their Email Service Provider (ESP).

Engineering at Sendwithus

Our engineering team is relatively small, given the scale of our service. We have six full-time engineers and most of us work on all aspects of the platform, from backend architecture, to frontend interfaces, to open source projects.

We also have two customer success engineers that serve as our QA and support team, hunting down and resolving bugs as they occur and rigorously field-testing new features.

the sendwithus team

Initial Architecture

Sendwithus started in late 2012 as a static Django application, deployed to Amazon EC2. We ran several product and marketing experiments (affectionately named Sendwithus Zero) in an effort to learn more about who our customers were and what problems they were looking to solve.

When the time came to actually build a working product, we stuck with Django. We added Gunicorn to power our web servers, and a single PostgreSQL database storage layer to the backend.

We subscribed to the MonolithFirst pattern, with a single code base responsible for frontend, backend, api, and background workers. It worked great and let our engineering team (of two) move lightning fast.

We also switched from AWS to Heroku, taking advantage of Heroku Postgres. This was one of the best decisions we made early on. Heroku has its disadvantages (namely price and limited configuration control) but not worrying about deployment procedures and server maintenance allowed us to build and iterate very fast, focusing on what matters most. Today we still use Heroku to power some of our applications, simply because it’s so easy and fast to develop on. Our higher throughput applications have moved to AWS, primarily due to operating cost.

Early Scaling

As our first API customers came on board, we added RabbitMQ and Celery for background job processing. We initially chose a hosted RabbitMQ provider through the Heroku Add-on system. This was ok, but didn’t scale very well; more on that later.

We also added MongoDB for user-facing analytics, and Memcache to speed up our servers. We also used this Heroku Buildpack for PostgreSQL connection pooling (we still use a modified version of that buildpack today).

Our Django, PostgreSQL, Heroku stack served us well and apart from some performance tweaking here and there, got us through the early months with ease.

Queueing Millions of Events

Queueing makes up a large portion of our infrastructure needs, from queueing emails for rendering and delivery, to queueing bounce and click events for processing. Our queueing systems must also handle large, unexpected bursts of traffic.

event queue throughput

Our hosted RabbitMQ + Celery solution was first to show signs of scaling problems. We’d outgrown the capabilities of a single RabbitMQ instance and needed something that could handle more throughput.

We opted to try IronMQ, which offers a great hosted queueing service. We used IronMQ for almost a year before switching to AWS SQS. SQS has been rock solid and reasonably priced; I can’t imagine switching off anytime soon. Today we operate 50+ queues, pushing more than 100M messages/day.

Processing and Storing Millions of Events

Our second major hurdle was PostgreSQL throughput. As our customers grew so did our database needs. We had nearly 2TB of email logs and metadata data in our single PostgreSQL instance, and we were adding more than 10 GB each day.

We explored a wide variety of options, including Cassandra, MongoDB, and CitusDB, but ultimately landed on AWS DynamoDB. Our engineering team has written about our migration to DynamoDB and performance issues we ran into along the way. Today we’re very happy to have migrated when we did.

Current Architecture: Dedicated Applications and Services

To date we’ve processed nearly two billion emails for our customers. We still run the original Django monolith, but we’ve moved core pieces of functionality into dedicated applications and services. These aren’t exactly microservices - some of them are full-fledged public facing apps with complex storage layers.

The entire Sendwithus product is divided into five major applications: Landing, Application, API, Delivery, and Analytics. Each application started as a fork of the original Django project and was allowed to evolve on it’s own timeline. This meant storage layers, deployment methods, even programming languages could be selected specifically for each application.

For example, our Analytics application is currently written in Go and uses a combination of Redis, PostgreSQL, and DynamoDB to achieve high-throughput analytics and events processing. It deploys to Heroku and uses net/http and HttpRouter to serve web requests.

Our Landing application is a Jekyll project that powers our public web pages. It’s uses gulp to compile and deploys to Cloudfront using AWS S3. We also build open source projects whenever possible. Our API Docs, Victoria Startup Job Board and Template Gallery are powered by separate, open source repositories. This is great for collaborating with our customers; it’s pretty cool to show a customer a commit that fixes a bug they’ve reported.

another satisfied customer

Monitoring

As a general rule, each application uses common monitoring and logging pieces:

Sentry for error capturing
Papertrail for collecting and searching logs
New Relic, Librato, and CloudWatch for performance monitoring
Slack for real-time notifications

Developer Workflow

We use Github to store all our code repositories. Each repo maintains multiple branches, and we use Github Pull Requests for code reviews. This works great for now, but will likely change as the team grows.

We’ve also built a custom command line tool that allows anyone on our team to deploy any application by simply running the command swu deploy. The deploy process will run tests locally and then perform a rolling deploy of the application. Deployment logs are piped into Slack, providing our team with a comprehensive and searchable deploy history.

The Future

Our team is beginning to widely adopt Go, Docker, CoreOS, and EC2 for new infrastructure projects, and I imagine that will only increase as we scale. It’s likely we’ll continue to use Python, Django, and Heroku for low-throughput and UI heavy applications.

Looking to the future, we’re working to break our major applications down into smaller microservices. What these services do and how much code is shared between them is still to be determined, although we already have some good ideas.. And of course we'll continue to use our own API to power our emails :)

Utilities

Jekyll

Static Site Generators

sendwithus

Transactional Email

Business Tools

Slack

Group Chat & Notifications