About a year and a half ago (written June 2013) we moved from dedicated servers over to AWS. Thanks to AWS, we no longer have to think on a server level. Instead, we think of everything as a cluster of instances, and an instance is essentially a virtual server where we don’t have to worry about the hardware. It’s a relief to not have to worry about the hardware behind the instances.
The clusters we have are: WWW, API, Upload, HAProxy, HBase, MySQL, Memcached, Redis, and ElasticSearch, for an average total of 80 instances. Each cluster handles the job that its name describes, all working together for the common goal of giving you your daily (hourly?) dose of image entertainment.
Below is a diagram of how they all work together:
Elasticsearch is the engine that powers search on the site. From a high level perspective, it’s a Lucene wrapper that exposes Lucene’s features via a RESTful API. It handles the distribution of data and simplifies scaling, among other things.
Given that we are on AWS, we use an AWS cloud plugin for Elasticsearch that makes it easy to work in the cloud. It allows us to add nodes without much hassle. It will take care of figuring out if a new node has joined the cluster, and, if so, Elasticsearch will proceed to move data to that new node. It works the same way when a node goes down. It will remove that node based on the AWS cluster configuration.