Powering Pinterest Ads Analytics with Apache Druid

1,550
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

The Change

When we launched Promoted Pins in 2014, we chose Apache HBase as our database to store and serve all of our reporting metrics. At the beginning of our ads business, this was an appropriate choice because the number of reporting features needed and overall traffic was low. Additionally, HBase already had a proven record in the industry at this time, and we knew how to successfully operate an HBase cluster.

Five years later, our business has matured. As our ads scale has increased dramatically, so have the complexities of the metrics we report to our partners, which has rendered HBase insufficient for our fine-grained analytical needs. As a result, we surveyed the available options and settled on Druid to be the core component of our next iteration.

Why Druid?

HBase works very well when it comes to accessing random data points, but it’s not built for fast grouping and aggregation. In the past, we’ve solved this by pre-building these data views, but as the features needed for our reporting expanded, it was no longer possible to store so many different cuts. Druid allowed us to bypass all of this complicated data slicing ingestion logic, and also supports:

  • Real-time ingestion via Kafka
  • Automatic versioning of ingested data
  • Data pre-aggregation based on user-set granularity
  • Approximate algorithms for count-distinct questions
  • A SQL interface
  • Easy to understand code and a very supportive community

Data Ingestion

Druid supports two modes of ingestion: Native and Hadoop. Both are launched via a REST call to the Overlord node. In the native ingestion case, a thread is spawned directly on the MiddleManager node to read input data, while in the Hadoop case, Druid launches a MapReduce job to read the input in parallel. In both cases, the ingested data is automatically versioned based on its output datasource (table) and time interval. Druid will automatically start serving the newest version of the data as soon as it is available and keep the older segments in a disabled state, should we ever need to revert to a previous version. Since we have several different data pipelines producing different sets of metrics with the same dimensions into a single datasource, this was a problem for us. How do we keep the data versioned but not have each independent pipeline overwrite the previous one’s output?

Namespacing shard specs proved to be the answer. Druid’s standard approach to versioning segments is by their datasource name, time interval and time written. We expanded on this system by also including a namespace identifier. We then built a separate versioned interval timeline per namespace in a datasource, rather than just one timeline per datasource:

This also meant that we needed to either change the existing ingestion mechanisms to create segments with namespaces or invent a new ingestion mechanism. Since we ingest billions of events per day, native ingestion is too slow for us, and we were not keen on setting up a new Hadoop cluster and changing the Hadoop indexing code to adhere to namespaces.

Instead, we chose to adapt the metamx/druid-spark-batch project to write our own data ingestion using Spark. The original druid-spark-batch project works in a similar fashion to the Hadoop indexer, but instead of launching a Hadoop job, it launches a Spark job. Our project runs inside of a stand-alone job without the need to use any resources of the Druid cluster at all. It works as follows:

  1. Filter out events not belonging to the output interval
  2. Partition data into intervals based on the configured granularity and number of rows per segment file
  3. Use a pool of Druid’s IncrementalIndex classes to persist intermediate index files on disk in parallel
  4. Use a final merge pass to collect all index files into a segment file
  5. Push to deep storage
  6. Construct and write metadata to MySQL

Once the metadata is written, the Druid coordinator will find new segments on its next pull of the metadata table and assign the new segments to be served by historical nodes.

Cluster Setup

In general, the date ranges for querying advertising data fall into three categories:

  1. Most recent time period to display
  2. Year-over-year performance reporting
  3. Random ad-hoc queries of old, historical data.

The number of queries for the most recent day vastly outnumber all other reporting types. With this understanding, we bucketed our Druid cluster into three historical tiers:

  • A “hot” tier serving the most recent data on expensive compute-optimized nodes to handle large QPS.
  • A “cold” tier on mid compute, lots of disk space-optimized nodes. Serves the last year of data sans data in the Hot tier.
  • An “icy” tier on low compute nodes having even more disk space. Serves all other historical data.

Each historical in the hot tier has very low maximum data capacity to guarantee that all segments the node is serving are loaded in memory without needing to page swap. This ensures low latency for most of our user-driven queries. Queries for older data are generally made by automated systems or report exports which allow for higher latency in preference to high operating cost.

While this works very well for the average query patterns, there are cases of unexpected high load which require higher QPS tolerance from the cluster. The obvious solution here would be to scale up the number of historical nodes for these specific cases, but Druid’s data rebalancing algorithm is very slow at scale. It can take many hours or even days for a multi-terabyte cluster to rebalance data evenly once a new set of servers joins the fleet. To build an efficient auto-scaling solution, we could not afford to wait so long.

Since optimizing the rebalancing algorithm would be very risky to deploy on a huge production system, we decided instead to implement a solution for mirroring tiers. This system uses maximum bipartite matching to link each node in the mirror tier to exactly one node in the primary tier. Once the link is established, the mirroring historical doesn’t need to wait to be assigned segments by the rebalancing algorithm. Instead, it will pull the list of segments served by the linked node from the primary tier and download those from deep storage for serving. It doesn’t need to worry about replication since we expect these mirror tiers to be turned on and off very frequently, operating only during periods of heavy traffic. See below for more information:

During testing we were able to achieve significant auto-scaling improvement given a mirroring tier solution. The most significant portion of time taken now from server launch to query serving is limited I/O bandwidth from deep storage.

Time taken to load 31 TB of data. 2 hours for natural rebalancing. 5 minutes for mirroring tier.

Query Construction

Our Druid deployment is external facing, powering queries made interactively from our ads management system as well as programmatically through our external APIs. Often these query patterns will look very different per use case, but in all cases, we needed a service to construct Druid queries quickly and efficiently as well as to reject any invalid queries. Programmatic access to our API means that we receive a fair number of queries which request invalid dates or repetitive queries asking for entities which have no metrics.

Percent of queries returning empty results per API client. Some clients request non-existent metrics up to 90% of the time.

Constructing and asking Druid to execute these queries is possible but accrues overhead which is unaffordable in a low-latency system. To short-circuit queries for non-existent entities, we developed a metadata store listing entities and their metric-containing time intervals. If a query’s requested entities have no metrics for the specified time intervals, we can return immediately and relieve Druid from additional network and CPU workload.

Druid supports two APIs to query data: native and SQL. SQL support is a newer feature backed by Apache Calcite. In the backend, it takes a Druid SQL query, parses it, analyzes it, and turns it into a Druid native query which is then executed. SQL support has numerous advantages — it’s much more user friendly and certainly better at constructing more efficient ad-hoc queries than if the user was to come up with some unfamiliar JSON.

SQL was our first choice when implementing our query constructor and execution service namely due to our familiarity with SQL. It worked, but we quickly identified certain query patterns which Druid could not complete and traced the issue to performance bottlenecks in the SQL parser for queries with thousands of filters or many complicated projections. In the end, we settled on using native queries as our primary access path to Druid, keeping SQL support for internal use cases that are not latency sensitive.

System Tuning

Coming from a key-value world, the individual queries originating from our API layer were tailored to be low in complexity to allow an optimal number of point lookups. This also meant querying each entity individually, resulting in high QPS in the backend. To minimize the disruption to our entire infrastructure, we wanted to keep our changes simple and get as close as possible to simply exchanging HBase for Druid. In practice, that proved to be completely impossible.

Druid holds network connections between servers in a greedy manner, using a set of new connections per query. It also opens object handles per query, which is the primary bottleneck in a high QPS system. To lessen the network load, we ramped up the complexity of each query by batching the number of requested entities. We observed our system to perform at its best with between 1,000 to 2,000 requested entities in IN filter type queries, although every deployment will differ.

QPS after implementing query batching. 15,000 request / second peaks lowered by 10x

On the server side, we found the basic cluster tuning guidance suggested by the Druid documentation very helpful. One non-obvious caveat is being mindful of how many GroupBy queries can be executed at any time given the number of merge buffers configured. GroupBy queries should be avoided whenever possible in preference to Timeseries and TopN queries. These types of queries do not require merge buffers and therefore need fewer resources to execute. In our stack, we have the option to impose rate limiting based on query type to avoid too many GroupBy queries at once given the number of configured merge buffers.

The Future

We’re excited to have finished the long journey to bring Druid into production, but of course our work continues. As Pinterest’s business grows, our work on the core Druid platform for analytics has to evolve alongside it. It might be difficult to seamlessly contribute all our effort into the main Druid repository, but we hope to share our effort with the community. Namely on features such as a Spark writer and reader of Druid segments, mirroring tiers for auto scaling, and developing a new multiplexing IPC protocol instead of HTTP. While ads analytics matures, we are also onboarding other teams’ use cases, helping them discover how best to use Druid at scale for their needs.

Acknowledgments

This project was a joint effort across multiple teams: Ads Data, Ads API, and Storage & Caching. Contributors and advisors include Lucilla Chalmer, Tian-Ying Chang, Julian Jaffe, Eric Nguyen, Jian Wang, Weihong Wang, Caijie Zhang, and Wayne Zhao.

Credit also goes to Imply.io leaders Gian Merlino and Fangjin Yang for introducing us to and helping us bootstrap Druid.

We’re building the world’s first visual discovery engine. More than 320 million people around the world use Pinterest to dream about, plan and prepare for things they want to do in life. Come join us!

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Video Platform Engineer
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Video is becoming the most important content format on Pinterest ecosystem. This role will act as an architect for Pinterest video platform, which responsible for the whole lifecycle of a video from uploading, transcoding, delivery and playback. The video architect will oversee Pinterest video platform strategy, owns the direction of what will be our next strategic investment to strengthen our video platform, and land the strategy into major initiatives towards the directions.

What you'll do: 

  • Lead the optimization and improvement in video codec efficiency, encoder rate control, transcode speed, video pre/post-processing and error resilience.
  • Improve end-to-end video experiences on lossy networks in various user scenarios.
  • Identify various opportunities to optimize in video codec, pipeline, error resilience.
  • Define the video optimization roadmap for both low-end and high-end network and devices.
  • Lead the definition and implementation of media processing pipeline.

What we're looking for: 

  • Experience with AWS Elemental
  • Solid knowledge in modern video codecs such as H.264, H.265, VP8/VP9 and AV1. 
  • Deep understanding of adaptive streaming technology especially HLS and MPEG-DASH.
  • Experience in architecting end to end video streaming infrastructure.
  • Experience in building media upload and transcoding pipelines.
  • Proficient in FFmpeg command line tools and libraries.
  • Familiar with popular client side media frameworks such as AVFoundation, Exoplayer, HLS.js, and etc.
  • Experience with streaming quality optimization on mobile devices.
  • Experience collaborating cross-functionally between groups with different video technologies and pipelines.

#LI-EA1

Senior Software Engineer, Data Privacy
Dublin, IE

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Data Privacy Engineering team builds platforms and works with engineers across Pinterest to help ensure our handling of customer and partner data meets or exceeds their expectations of privacy and security.  We’re a small, and growing, team based in Dublin.  We own three major engineering projects with company-wide impact: expanding and onboarding teams doing big data processing to a new fine-grained data access platform, tracking how data moves and evolves through our systems, and ensuring data is always handled appropriately.  As a Senior Engineer, you’ll take a driving role on one of these projects and responsibility for working with internal teams to understand their needs, designing solutions, and collaborating with teams in Dublin and the US to successfully execute on your plans.  Your work will help ensure the safety of our users’ and partners’ data and help Pinterest be a source of inspiration for millions of users.

What you’ll do:

  • Consult with engineers, product designers, and security experts to design data-handling solutions
  • Review code and designs from across the company to guide teams to secure and private solutions
  • Onboard customers onto platforms and refine our tools to streamline these processes
  • Mentor and coach engineers and grow your technical leadership skills, with engineers in Dublin and other offices.
  • Grow your engineering skills as you work with a range of open-source technologies and engineers across the company, and code across Pinterest’s stack in a variety of languages

What we’re looking for:

  • 5+ years of experience building enterprise-scale backend services in an object-oriented programing language (Java preferred)
  • Experience mentoring junior engineers and driving an engineering culture
  • The ability to drive ambiguous projects to successful outcomes independently
  • Understanding of big-data processing concepts
  • Experience with data querying and analytics techniques
  • Strong advocacy for the customer and their privacy

#LI-KL1

Software Engineer, Key Value Systems
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest brings millions of Pinners the inspiration to create a life they love for everything; whether that be tonight’s dinner, next summer’s vacation, or a dream house down the road. Our Key Value Systems team is responsible for building and owning the systems that store and serve data that powers Pinterest's business-critical applications. These applications range from user-facing features all the way to being integral components of our machine learning processing systems. The mission of the team is to provide storage and serving systems that are not only highly scalable, performant, and reliable, but also a delight to use. Our systems enable our product engineers to move fast and build awesome features rapidly on top of them.

What you’ll do

  • Build, own, and improve Pinterest's next generation key-value platform that will store petabytes of data, handle tens of millions of QPS, and serve hundreds of use cases powering almost all of Pinterest's business-critical applications
  • Contribute to open-source databases like RocksDB and Rocksplicator
  • Own, improve, and contribute to the main key-value storage platform, streaming write architectures using Kafka, and additional derivative
  • RocksDB-based distributed systems
  • Continually improve operability, scalability, efficiency, performance, and reliability of our storage solutions

What we’re looking for:

  • Deep expertise on online distributed storage and key-value stores at consumer Internet scale
  • Strong ability to work cross-functionally with product teams and with the storage SRE/DBA team
  • Fluent in C/C++ and Java
  • Good communication skills and an excellent team player

#LI-KL1

Head of Ads Delivery Engineering
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is on a mission to help millions of people across the globe to find the inspiration to create a life they love. Within the Ads Quality team, we try to connect the dots between the aspirations of pinners and the products offered by our partners. 

You will lead an ML centric organization that is responsible for the optimization of the ads delivery funnel and Ads marketplace at Pinterest. Using your strong analytical skill sets, thorough understanding of machine learning, online auctions and experience in managing an engineering team you’ll advance the state of the art in ML and auction theory while at the same time unlock Pinterest’s monetization potential.  In short, this is a unique position, where you’ll get the freedom to work across the organization to bring together pinners and partners in this unique marketplace.

What you’ll do: 

  • Manage the ads delivery engineering organization, consisting of managers and engineers with a background in ML, backend development, economics and data science
  • Develop and execute a vision for ads marketplace and ads delivery funnel
  • Build strong XFN relationships with peers in Ads Quality, Monetization and the larger engineering organization, as well as with XFN partners in Product, Data Science, Finance and Sales

What we’re looking for:

  • MSc. or Ph.D. degree in Economics, Statistics, Computer Science or related field
  • 10+ years of relevant industry experience
  • 5+ years of management experience
  • XFN collaborator and a strong communicator
  • Hands-on experience building large-scale ML systems and/or Ads domain knowledge
  • Strong mathematical skills with knowledge of statistical models (RL, DNN)

#LI-TG1

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like