What is HAProxy?
Who uses HAProxy?
Why developers like HAProxy?
Here are some stack decisions, common use cases and reviews by companies and developers who chose HAProxy in their tech stack.
When accepting events, we would be crazy to just expose the Python web process to the public Internet and say, “Alright, give me all you got!” Instead, we use two different proxying services that sit in front of our web machines:
1) NGINX, our product-aware proxy, handles many of the upper bounds that we have deemed reasonable. It is responsible for a variety of bounds, but its most popular one is protecting Sentry from exceedingly large event volumes. Ever so often, a user will run into a problem where they’ve deployed their code out into the abyss, and their event volume clocks in at a few zeroes higher than what they signed up for.
2) In front of NGINX, we use another proxying service called HAProxy, which acts as a delta of connections without any of that product awareness logic and has a lot higher throughput. All it does is accept connections and send them off to different NGINX servers, allowing us to gracefully add or remove NGINX servers as we see fit.
We're using Git through GitHub for public repositories and GitLab for our private repositories due to its easy to use features. Docker and Kubernetes are a must have for our highly scalable infrastructure complimented by HAProxy with Varnish in front of it. We are using a lot of npm and Visual Studio Code in our development sessions.
Over the past year, we've shifted our philosophy on managed services and have moved several critical parts of our infrastructure away from self-managed options. The most prominent was our shift away from HAProxy to AWS's managed application load balancers (ALBs).
As we scaled, managing our HAProxy fleet became a larger and larger burden. We spent a significant amount of time tuning our configuration files and benchmarking different Amazon EC2 instance types to maximize throughput.
Emerging needs like #DDoS protection and auto scaling turned into large projects that we needed to schedule urgently. Instead of continuing this investment, we chose to shift to managed ALB instances. This was a large project, but it quickly paid for itself as we've nearly eliminated the time spent managing load balancers. We also gained DDoS protection and auto scaling "for free".
Around the time of their Series A, Pinterest’s stack included Python and Django, with Tornado and Node.js as web servers. Memcached / Membase and Redis handled caching, with RabbitMQ handling queueing. Nginx, HAproxy and Varnish managed static-delivery and load-balancing, with persistent data storage handled by MySQL.
The frontline API is proxied through a HAProxy load balancer with NGINX as the fronted, which also handles SSL termination. This frontline API consist of 600 stateless endpoints that join together multiple services.
As part of the Marketplace stack, engineers in this area integrate with various other internal services, including logtron to log to disk and Kafka and uber-statsd-client, the Node.js client for statsd.
In early 2013, Airbnb tackled the problem of service discovery and load balancing in the context of a service oriented architecture (SOA) by building and releasing an open source tool called SmartStack. SmartStack is built on two other open source tools created by Airbnb called Nerve and Synapse.
Nerve is a service registration daemon that performs health checks that “creates ephemeral nodes in Zookeeper which contain information about the address/port combos for a backend available to serve requests for a particular service.”
Synapse is a transparent service discovery framework for connecting an SOA that reads the information in Zookeeper for available backends, and then uses that information to configure a local HAProxy process, which then routes requests between clients and services.