How imgix Built A Stack To Serve 100,000 Images Per Second

5,668
M znw ig
imgix
Imgix was founded on the principle that working with images on the web should be as simple as formatting a date or localizing a currency. To get it exactly right, we have built server high-quality image rendering farms, put them behind our fast, globally distributed network infrastructure, and made the entire service accessible through a dead-simple API that can be used in an IMG src attribute.

By Kelly Sutton, ‎Chief Product Officer at Imgix.


What We Do

With over 60% of the average webpage’s weight being image content, serving the best image in the smallest payload is an increasingly critical concern for both businesses and developers. Every additional second of load time for a page will increase its bounce rate by 7%, according to KISSMetrics. Imagine losing 7% of the revenue from an e-commerce website simply because the images were suboptimal!

imgix is a real-time image processing and delivery service that works with your existing images. It was designed from the ground up to empower businesses and developers to serve the most optimal images under any circumstance. With a simple URL API, images are fetched and transformed in real-time then served anywhere in the world via CDN.

Individual customers have been able to eliminate millions of images from storage and generate an infinite number of derivative images on the fly. By resizing, cropping, and adjusting their image content dynamically, they are able to avoid wasting precious bytes and keep their websites snappy.

The Technical Challenge

imgix has more than 80 different URL parameters that can be layered and combined for sophisticated effects. For instance, imgix exposes controls for adjusting lossiness, chroma subsampling rates, color quantization, and more. In addition, imgix can respond to any number of inputs from the various browsers, phones, tablets, and screens displaying an image. The sheer number of parameters and inputs that can affect an image output is growing exponentially.



In designing our infrastructure, we have operated under the presumption that every image request we receive will be uncacheable and must be dynamically created. This fundamentally changes how one thinks about building and scaling their service. Over the years, we have built and rebuilt our infrastructure to be able to handle more than 100,000 images per second with a 90th percentile input filesize of 4.5 MB.  With the current stack, imgix is able to offer the highest quality images at the fastest speeds and the lowest prices.

Engineering at imgix

The engineering team at imgix is highly skilled and proven at working at a large scale. They come from ops and engineering teams at YouTube, Google, Dropbox, Foursquare, and Yahoo. While all of our engineers are polyglots, capable of moving between different languages and frameworks as needed, they trend towards C, Go, Python and Lua when given the opportunity. The vast majority of our technology is written in one of these languages.

The Early Versions

The initial architecture of imgix was built on top of Amazon EC2. After onboarding the initial batch of customers, it became apparent that we would not be able to offer the performance and quality we wanted to target. Most importantly, having worked for large scale Internet companies and managing opex spends for those companies, our architects knew from experience the corners we could be painted into by scaling up in the cloud. We made an early and critical decision to focus on building all core infrastructure in our own metal. The question became what then should we build on.

An early Facebook photos engineer mentioned he had seen remarkable performance and quality coming out of the Apple Core Graphics stack . Though we initially thought it was crazy, our tests quickly showed otherwise. Confronted by this unexpected reality, we made the decision to transition our stack to Apple's Core Graphics. Mac Minis were , at the time, the most effective dollar-per-gigaflop machine on the market specifically for images. So we quickly maxed out our credit cards at the local Apple store on Mac Minis. Some friends with old Linux hardware they were spinning down, donated their servers to us for all of the non-image processing services we would need to run.

In the very early days, a handful of these servers were located in the CEO’s living room. (Don’t worry, they all had redundancies .) Over time, we began to expand into hosted colo facilities in Las Vegas (macminicolo.net) and Dallas (Linode). As we continued to grow, some old colleagues from YouTube offered us some space in their server cabinet at Equinix SV1 in San Jose. We quickly outgrew the 4U of space they gave us there and contracted out two more cabinets of our own. As our traffic continued to grow, we decided we quickly needed to move into our own datacenter space.

The Current Stack

The core infrastructure of imgix is composed of many service layers. There is the origin fetching layer, the origin caching layer, the image processing layer, the load balancing and distribution layer, and the content delivery layer. Additionally, each layer interfaces with omnipresent configuration, logging, monitoring, and supervision services.

Our fetching and caching layers are largely custom built, using MogileFS, nginx, and HAProxy as underlying technologies. The load balancing and distribution layer is based on custom C code and a LuaJIT framework we created called Levee. Levee is capable of servicing 40K requests per second on a single machine. By switching some services from Python to LuaJIT, we have seen a 20x performance increases. We will be open-sourcing it once we feel it’s mature enough. For boundaries, we run a combination of HAProxy, nginx, and OpenResty.

Our image processing layer is our most highly-tuned and custom layer. We run a very high performance custom image processing server that we built using C, Objective-C, and Core Graphics. Since the image operations themselves take a fraction of a millisecond in a GPU, most of our performance work has been in optimizing the path from the network interface/local memory cache into the GPU texture buffer. For images that can fit entirely within the GPU texture buffer, we see performance in the sub-50ms range end-to-end. All changes to the image-processing layer are run through a suite of regression tests to make sure we do not introduce any visual disparities between builds.

For last mile content delivery, we use Fastly  which allows us to hyper-optimize our traffic at the edge using Varnish. With more than 20 global Fastly POPs, all imgix customers receive their images quickly. All of this working in unison means our 90th percentile end-to-end response times is under 700ms for first-fetch, uncached images during peak hours.

Logging each layer is critical to imgix, so we have had to build a comprehensive logging pipeline. We currently use Heka to handle much of the raw aggregation of data, and then feed the data downstream to Riemann, Hosted Graphite and Google BigQuery (real-time data, statistics data, analytics data respectively).

We leverage several open source projects to make managing and monitoring this stack easier. Ansible handles our configuration management while Consul manages service discovery. Prometheus is used for monitoring, which plugs into the company PagerDuty account. We use StatusPage.io to report the current infrastructure status to our customers.

Our web front-end services are completely separate from our core infrastructure. They are built using Angular, Ember, or Tornado depending on the task. These services provide web interfaces to configure and administer your imgix account. We build separate Docker containers for development, testing, and production for each front-end project. We use CircleCI for our internal services and Travis CI for our open-source projects and libraries.

imgix practices continuous integration, and often deploys several times per day. We use GitHub for hosting the repositories for each service. We use GitHub Issues for tracking work in progress, and Trello for planning our roadmap. We practice master-only development and GitHub Flow for iterating on each service.

Discussions around these services happen in Slack or around the proverbial water cooler.

Between the founding of imgix and now, Apple has released hardware which better fits our needs. New image processing nodes are now Mac Pros. We now rack them with a custom 46U rack in our data center. Surprisingly, these passive racks have better space and power utilization than most Linux-based solutions.



In short, imgix is quite a bit more than just ImageMagick running on EC2.

Making Every Image Better

imgix was started to remove the headaches associated with managing images for websites and apps. Creating a performant, fetch-based API is not easy. We are hard at work to level up the current infrastructure to provide some very powerful improvements and to reach new levels of scale. Although this stack is built to handle over 100,000 images per second currently, we are excited to reach for 1M images per second and beyond.

We are hiring.

If you have any questions about our stack, we’re more than happy to answer them in the comments. If you are tired of running your duct tape ImageMagick setup, signing up for imgix is free.


M znw ig
imgix
Imgix was founded on the principle that working with images on the web should be as simple as formatting a date or localizing a currency. To get it exactly right, we have built server high-quality image rendering farms, put them behind our fast, globally distributed network infrastructure, and made the entire service accessible through a dead-simple API that can be used in an IMG src attribute.
Software Engineer - Infrastructure
San Francisco, CA
At imgix, we are working to raise the intrinsic value of every image on the Internet. To accomplish this, we are building the world's largest distributed image processing pipeline. The tools we are building will make images on the Internet more beautiful, more performant, more valuable, and safer. Here at imgix we transform oodles of images in real time and are growing rapidly. To achieve this in a cost effective way, we create custom, high-performance networking appliances that intelligently and dynamically route and cache work requests and network traffic. We need your help to continue to evolve these services are we journey through our next phase of growth.
  • Experience writing asynchronous applications in a high performance environment. e.g. but not limited to: Go, Gevent, Twisted or Nginx custom scripting
  • Experience deploying, monitoring, profiling, and iterating on the services you create
  • Good understanding of network programming
  • Strong Unix systems knowledge
  • Scripting experience including Bash, Python
  • Attention to detail and a preference to iterate, instead of, to rewrite from scratch 
  • Experience with Lua/jit and C
  • Experience with service coordination especially with Consul
  • Comments
    Open jobs at imgix
    Lead Python Engineer
    San Francisco, CA
    At imgix, we are working to raise the intrinsic value of every image on the Internet. To accomplish this, we are building the world's largest distributed image processing pipeline. The tools we are building will make images on the Internet more beautiful, more performant, more valuable, and safer. You will work on and maintain our internal "air-traffic-control" system known as Blacktip, as well as our external API system that interfaces with it, known as Remora. Both systems are written in Python, using the latest version of Tornado. In addition, you will build internal services that interact with these systems as needed. The ideal candidate would have a high attention to detail and a drive towards building clear and consistent frameworks, especially in the Python environment. Blacktip was designed to abstract away the mess of integrating large numbers of services with each other in a semi-abstract way that is easy to reason about and easy to test. Your responsibility would be to carry this mission forward through clean code, comprehensive tests, and a steadfast dedication to refactoring away one-off solutions. As the air-traffic-control system for all imgix workflows, Blacktip needs to interact with any number of internal and external services. As these services are often outside of our control and exhibit a wide range of performance characteristics (e.g. a hefty BigQuery query can take several minutes), Blacktip has been designed with async in mind. Experience with Python's async support and/or Tornado's async support would be preferred. Experience in Docker, Circus, Haproxy, and CircleCI (or things like them) is a plus. These tools are used in testing and deploying our internal systems. While tests and deployments are largely one-click, knowing these tools can help in debugging any issues.
    Software Engineer: Imaging
    San Francisco, CA
    At imgix, we are working to raise the intrinsic value of every image on the Internet. To accomplish this, we are building the world's largest distributed image processing pipeline. The tools we are building will make images on the Internet more beautiful, more performant, more valuable, and safer. We are looking for engineers who have experience working with images at a low level. Your background may vary but would likely include experience in one of the following fields: computer vision, computer graphics, image processing, game development, image analysis, or color science. You would be responsible for researching, implementing, and optimizing algorithms that extend our existing featureset and improve the quality of the images we deliver. This would involve all aspects of images on the web including the visual quality, compression, and file formats.
    Developer Relations
    San Francisco, CA
    At imgix, we are working to raise the intrinsic value of every image on the Internet. To accomplish this, we are building the world's largest distributed image processing pipeline. The tools we are building will make images on the Internet more beautiful, more performant, more valuable, and safer. As our first developer relations hire, you would be responsible for defining how imgix interacts with and encourages outside developers. Your mission is to showcase what makes imgix valuable to existing developers and educate potential new developers on how we can be valuable to them. Included in that mission will be defining a culture that promotes open source initiatives, technical collaborations, and developer outreach. Ultimately, the goal is to attract and engage with great engineers at great companies who can truly benefit from the tools that imgix provides.
  • Be an active voice representing imgix within the developer community.
  • Drive open source initiatives, both internally and externally. Cultivate projects that are directly or indirectly aligned with imgix.
  • Participate in working groups and standards bodies on behalf of imgix.
  • Attend conferences with the goal of establishing a presence for imgix. Setup booths, provide demos, and organize events for customers to attend.
  • Work with our engineering and design teams on developing clear materials for developers to use in learning more about imgix.
  • Broadcast out about new feature launches and infrastructure upgrades. This may include writing blog posts, documenting case studies, sending email newsletters, and speaking with developer communities.
  • BA/BS degree or equivalent experience.
  • 3+ years of experience working in developer relations.
  • History of working with SaaS and B2B businesses preferable.
  • Experience in at least one major programming language, such as Python, Go, Ruby, C, etc.
  • Skilled at working with HTML and CSS.
  • Great verbal and written communication skills.
  • Open to travel.
  • Support Engineer
    San Francisco, CA
    At imgix, we are building the world's largest distributed image processing pipeline -- a graphics card for the entire Internet. The tools we are building will make images on the Internet load faster, look better, and respond automatically to design/device updates. We are a fast-growing startup that has found traction and is now taking off. We need help supporting our new and existing customers with their technical questions and engineering implementations. Your mission is to deliver effective technical customer support to our rapidly growing customer base, delivering solutions to both technical and non-technical end users while also supporting a wide range of technologies. You would be responsible for receiving and handling technical support tickets, helping with any implementation engineering for onboarding customers, and solving existing technical issues customers may have. You should feel comfortable communicating with customers and responding to feedback in a timely manner, escalating issues as they arise. The right candidate will have at least 1 year experience in support and/or implementation engineering, preferably in SaaS/technology-related companies. You will work closely with both the support and engineering teams. Experience working at a startup is a plus, but is not required. Familiarity with image manipulation software (Photoshop, Scene 7, etc.) and how CDN distribution works is also a plus, but not required. This role will be evolving over time as the company grows. If you are looking to learn more on the job than you ever did before, and to grow into larger tech support responsibilities as the company scales, this might be the job for you. Given we are a startup, flexibility is key. We will often be figuring things out as we go, and over time, your input will help determine future processes and policies. Most important is the desire to build something meaningful and to be a productive member of a hard-working team. We are all passionate about building something great here at imgix, and are looking for like-minded people to help us accomplish that.
  • Respond to incoming support tickets related to technical questions and issues.
  • Work to provide application support to customers, pertaining to software functionality, incident resolution and system configuration.
  • Contribute to overall success of the customer, the support team and the company.
  • Escalate and work within the engineering team to resolve complex support issues.
  • Create Help Center articles to address frequently asked questions.
  • BA/BS degree in an engineering-related field or equivalent experience.
  • 1 year experience in support and/or engineering.
  • History of working with SaaS businesses preferred
  • Experience working with web applications like Zendesk, Sifter, Trello, and Salesforce.
  • A strong knowledge of Internet technologies such as HTML, Javascript, CSS, and at least one server-side framework. Python is a plus.
  • Strong problem solving skills. Easily adaptable to new situations. Eager to learn and to contribute.
  • Verified by
    You may also like
    E-Commerce at Scale: Inside Shopify's Tech Stack
    How SendGrid Scaled to 40 Billion Emails Per Month
    How Stream Built a Modern RSS Reader With JavaScript
    How Heap Built an Analytics Platform that Auto-Tracks Every User Event