Powering Pinterest Ads Analytics with Apache Druid

669
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

The Change

When we launched Promoted Pins in 2014, we chose Apache HBase as our database to store and serve all of our reporting metrics. At the beginning of our ads business, this was an appropriate choice because the number of reporting features needed and overall traffic was low. Additionally, HBase already had a proven record in the industry at this time, and we knew how to successfully operate an HBase cluster.

Five years later, our business has matured. As our ads scale has increased dramatically, so have the complexities of the metrics we report to our partners, which has rendered HBase insufficient for our fine-grained analytical needs. As a result, we surveyed the available options and settled on Druid to be the core component of our next iteration.

Why Druid?

HBase works very well when it comes to accessing random data points, but it’s not built for fast grouping and aggregation. In the past, we’ve solved this by pre-building these data views, but as the features needed for our reporting expanded, it was no longer possible to store so many different cuts. Druid allowed us to bypass all of this complicated data slicing ingestion logic, and also supports:

  • Real-time ingestion via Kafka
  • Automatic versioning of ingested data
  • Data pre-aggregation based on user-set granularity
  • Approximate algorithms for count-distinct questions
  • A SQL interface
  • Easy to understand code and a very supportive community

Data Ingestion

Druid supports two modes of ingestion: Native and Hadoop. Both are launched via a REST call to the Overlord node. In the native ingestion case, a thread is spawned directly on the MiddleManager node to read input data, while in the Hadoop case, Druid launches a MapReduce job to read the input in parallel. In both cases, the ingested data is automatically versioned based on its output datasource (table) and time interval. Druid will automatically start serving the newest version of the data as soon as it is available and keep the older segments in a disabled state, should we ever need to revert to a previous version. Since we have several different data pipelines producing different sets of metrics with the same dimensions into a single datasource, this was a problem for us. How do we keep the data versioned but not have each independent pipeline overwrite the previous one’s output?

Namespacing shard specs proved to be the answer. Druid’s standard approach to versioning segments is by their datasource name, time interval and time written. We expanded on this system by also including a namespace identifier. We then built a separate versioned interval timeline per namespace in a datasource, rather than just one timeline per datasource:

This also meant that we needed to either change the existing ingestion mechanisms to create segments with namespaces or invent a new ingestion mechanism. Since we ingest billions of events per day, native ingestion is too slow for us, and we were not keen on setting up a new Hadoop cluster and changing the Hadoop indexing code to adhere to namespaces.

Instead, we chose to adapt the metamx/druid-spark-batch project to write our own data ingestion using Spark. The original druid-spark-batch project works in a similar fashion to the Hadoop indexer, but instead of launching a Hadoop job, it launches a Spark job. Our project runs inside of a stand-alone job without the need to use any resources of the Druid cluster at all. It works as follows:

  1. Filter out events not belonging to the output interval
  2. Partition data into intervals based on the configured granularity and number of rows per segment file
  3. Use a pool of Druid’s IncrementalIndex classes to persist intermediate index files on disk in parallel
  4. Use a final merge pass to collect all index files into a segment file
  5. Push to deep storage
  6. Construct and write metadata to MySQL

Once the metadata is written, the Druid coordinator will find new segments on its next pull of the metadata table and assign the new segments to be served by historical nodes.

Cluster Setup

In general, the date ranges for querying advertising data fall into three categories:

  1. Most recent time period to display
  2. Year-over-year performance reporting
  3. Random ad-hoc queries of old, historical data.

The number of queries for the most recent day vastly outnumber all other reporting types. With this understanding, we bucketed our Druid cluster into three historical tiers:

  • A “hot” tier serving the most recent data on expensive compute-optimized nodes to handle large QPS.
  • A “cold” tier on mid compute, lots of disk space-optimized nodes. Serves the last year of data sans data in the Hot tier.
  • An “icy” tier on low compute nodes having even more disk space. Serves all other historical data.

Each historical in the hot tier has very low maximum data capacity to guarantee that all segments the node is serving are loaded in memory without needing to page swap. This ensures low latency for most of our user-driven queries. Queries for older data are generally made by automated systems or report exports which allow for higher latency in preference to high operating cost.

While this works very well for the average query patterns, there are cases of unexpected high load which require higher QPS tolerance from the cluster. The obvious solution here would be to scale up the number of historical nodes for these specific cases, but Druid’s data rebalancing algorithm is very slow at scale. It can take many hours or even days for a multi-terabyte cluster to rebalance data evenly once a new set of servers joins the fleet. To build an efficient auto-scaling solution, we could not afford to wait so long.

Since optimizing the rebalancing algorithm would be very risky to deploy on a huge production system, we decided instead to implement a solution for mirroring tiers. This system uses maximum bipartite matching to link each node in the mirror tier to exactly one node in the primary tier. Once the link is established, the mirroring historical doesn’t need to wait to be assigned segments by the rebalancing algorithm. Instead, it will pull the list of segments served by the linked node from the primary tier and download those from deep storage for serving. It doesn’t need to worry about replication since we expect these mirror tiers to be turned on and off very frequently, operating only during periods of heavy traffic. See below for more information:

During testing we were able to achieve significant auto-scaling improvement given a mirroring tier solution. The most significant portion of time taken now from server launch to query serving is limited I/O bandwidth from deep storage.

Time taken to load 31 TB of data. 2 hours for natural rebalancing. 5 minutes for mirroring tier.

Query Construction

Our Druid deployment is external facing, powering queries made interactively from our ads management system as well as programmatically through our external APIs. Often these query patterns will look very different per use case, but in all cases, we needed a service to construct Druid queries quickly and efficiently as well as to reject any invalid queries. Programmatic access to our API means that we receive a fair number of queries which request invalid dates or repetitive queries asking for entities which have no metrics.

Percent of queries returning empty results per API client. Some clients request non-existent metrics up to 90% of the time.

Constructing and asking Druid to execute these queries is possible but accrues overhead which is unaffordable in a low-latency system. To short-circuit queries for non-existent entities, we developed a metadata store listing entities and their metric-containing time intervals. If a query’s requested entities have no metrics for the specified time intervals, we can return immediately and relieve Druid from additional network and CPU workload.

Druid supports two APIs to query data: native and SQL. SQL support is a newer feature backed by Apache Calcite. In the backend, it takes a Druid SQL query, parses it, analyzes it, and turns it into a Druid native query which is then executed. SQL support has numerous advantages — it’s much more user friendly and certainly better at constructing more efficient ad-hoc queries than if the user was to come up with some unfamiliar JSON.

SQL was our first choice when implementing our query constructor and execution service namely due to our familiarity with SQL. It worked, but we quickly identified certain query patterns which Druid could not complete and traced the issue to performance bottlenecks in the SQL parser for queries with thousands of filters or many complicated projections. In the end, we settled on using native queries as our primary access path to Druid, keeping SQL support for internal use cases that are not latency sensitive.

System Tuning

Coming from a key-value world, the individual queries originating from our API layer were tailored to be low in complexity to allow an optimal number of point lookups. This also meant querying each entity individually, resulting in high QPS in the backend. To minimize the disruption to our entire infrastructure, we wanted to keep our changes simple and get as close as possible to simply exchanging HBase for Druid. In practice, that proved to be completely impossible.

Druid holds network connections between servers in a greedy manner, using a set of new connections per query. It also opens object handles per query, which is the primary bottleneck in a high QPS system. To lessen the network load, we ramped up the complexity of each query by batching the number of requested entities. We observed our system to perform at its best with between 1,000 to 2,000 requested entities in IN filter type queries, although every deployment will differ.

QPS after implementing query batching. 15,000 request / second peaks lowered by 10x

On the server side, we found the basic cluster tuning guidance suggested by the Druid documentation very helpful. One non-obvious caveat is being mindful of how many GroupBy queries can be executed at any time given the number of merge buffers configured. GroupBy queries should be avoided whenever possible in preference to Timeseries and TopN queries. These types of queries do not require merge buffers and therefore need fewer resources to execute. In our stack, we have the option to impose rate limiting based on query type to avoid too many GroupBy queries at once given the number of configured merge buffers.

The Future

We’re excited to have finished the long journey to bring Druid into production, but of course our work continues. As Pinterest’s business grows, our work on the core Druid platform for analytics has to evolve alongside it. It might be difficult to seamlessly contribute all our effort into the main Druid repository, but we hope to share our effort with the community. Namely on features such as a Spark writer and reader of Druid segments, mirroring tiers for auto scaling, and developing a new multiplexing IPC protocol instead of HTTP. While ads analytics matures, we are also onboarding other teams’ use cases, helping them discover how best to use Druid at scale for their needs.

Acknowledgments

This project was a joint effort across multiple teams: Ads Data, Ads API, and Storage & Caching. Contributors and advisors include Lucilla Chalmer, Tian-Ying Chang, Julian Jaffe, Eric Nguyen, Jian Wang, Weihong Wang, Caijie Zhang, and Wayne Zhao.

Credit also goes to Imply.io leaders Gian Merlino and Fangjin Yang for introducing us to and helping us bootstrap Druid.

We’re building the world’s first visual discovery engine. More than 320 million people around the world use Pinterest to dream about, plan and prepare for things they want to do in life. Come join us!

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Engineering Manager, Advertiser Inter...
San Francisco, CA

Pinterest is one of the fastest growing online ad platforms and our success depends on managing our Ads well and keeping them safe for our pinners. The Advertiser Interfaces & Growth team is uniquely positioned to help our advertisers and our partners create and manage their campaigns. You’ll be joining an early stage team that is growing quickly and laying the foundation for Pinterest’s business success.

What you’ll do:

  • Collaborate with stakeholders across the organization to architect solutions that optimize our products and tools required to support the growing number of Pinterest advertisers
  • Provide technical and team leadership on rapid short-term projects/feature development as well as longer-term development of new services
  • Partner with product management to set engineering priorities, estimate scope of work, define release schedules, and track progress
  • Actively foster high-quality software development through code reviews, pair-programming, and targeted feature development and when needed to unblock the team, prototype new technologies and systems, or demonstrate good coding practices
  • Mentor and develop engineers and engineering managers
  • Enable and enhance production readiness of the broader ads manager ecosystem

What we’re looking for:

  • BS in Computer Science or a related technical field
  • 8+ years of experience as an engineering manager, preferably managing managers
  • 7+ years of software engineering experience as a full stack or backend engineer
  • Track record of developing high quality software in an automated build and deployment environment.
  • Experience leading a team of engineers through a significant feature or product launch in collaboration with Product and Design.
  • Experience with 1 or more of: Python, Java, JavaScript (Node.js, React, Angular)
  • Well versed in agile development methodologies (i.e. scrum, kanban)
  • Experience with ad tech a big plus

#LI-JY1

Partnerships Engineer
San Francisco, CA

The Business Development team at Pinterest leads partnerships across a diverse set of companies to provide our Pinners with world class experiences bringing them the inspiration to create a life they love. As a Partnerships Engineer, you’ll work with a variety of partners across Pinterest products (Creators, SMB, Ads, Shopping, Growth, Search etc) to define and build scalable Partner solutions, helping hundreds of millions of people find ideas and inspiration. You will be the evangelist of Pinterest’s technology stack, both internationally and domestically.

What you'll do:

  • Be a subject matter expert in Pinterest’s products and their underlying implementation
  • Define, develop and scale adoption of Pinterest partner products with strategic partners
  • Influence partners to build strategically viable technology using Pinterest’s tech stack, and work with these partners to make their launch successful in market
  • Collaborate with cross-functional teams on developing an overall developer ecosystem vision
  • Work closely with PM and Engineering teams to build right products per partner needs
  • Develop tools, processes, documentation and sample code to launch and scale partner products.
  • Represent Pinterest at developer events

What we're looking for:

  • BS in Computer Science or equivalent technical degree
  • 3+ plus years of software engineering experience, ideally in large consumer tech or any platform tech company with a strong ecosystem approach
  • Experience in partner/client facing roles or in product
  • Experience in large scale API/SDK based implementations
  • Deep understanding of mobile and web technology stack
  • Ability to read and write code as part of the standard web stack (HTTP, HTML, JavaScript) and familiarity with programming languages such as Python and Java

#LI-KO1

Engineering Manager, Shopping Discovery
Palo Alto, CA

The shopping team at Pinterest is inventing a brand new, more visual and personalized shopping experience for 300M+ users worldwide. Shopping is at the core of Pinterest’s mission to help people create a life they love. Every day hundreds of millions of users (Pinners) come on Pinterest to find inspiration to decorate their home, to wear outfits on different occasions, host parties and various other things to create a better life. Those inspirations are visual and reflect very detailed tastes of the Pinners regarding the choice of color, style, etc. The shopping team is responsible for connecting inspiration to products that Pinners would like to buy and create a life they love.

Connecting inspiration to product is very challenging and requires an understanding of user preferences, the content of the image, visual matching of images and selecting and ranking the top images based on various signals. More interestingly, we need to solve these challenges at the awe-inspiring scale of Pinterest for 300M+ users, tens of billions of inspiration pins and hundreds of millions of products. The team is using one of the most sophisticated computer vision techniques for image matching, deep learning for user understanding and ranking at the scale unimaginable at most places. If you are excited to improve lives using the magic of AI/ML at a very large scale then you must consider this position.

 

What you'll do:

  • Lead and manage the Shopping Discovery team of 7+ machine learning scientists and engineers in Palo Alto
  • Lead the effort to develop and improve shopping recommendation and search
  • Help drive technical strategy and longer term vision for Shopping at Pinterest
  • Spend 60% time on technical leadership/IC work and 40% time on people management
  • Use machine learning / deep learning techniques to solve some of the most large scale recommendation and search problems in the industry
  • Collaborate with partner teams like product, organic search, recommendations etc.

What we're looking for:

  • Ph.D. and 5+ years of experience or Masters and 8+ years of experience
  • Engineering Management experience for team of 5+ ML Engineers
  • Strong machine learning background within search, recommendations or similar ML problems

 

 

#LI-LP1

Senior Engineering Manager, Homefeed ...
San Francisco, CA

Homefeed is a discovery platform at Pinterest that helps users find and explore their personal interests. We work with some of the largest datasets in the world, tailoring over billions of unique content to 330M+ users. Our content ranges across all categories like home decor, fashion, food, DIY, technology, travel, automotive, and much more. Our dataset is rich with textual and visual content and has nice graph properties — harnessing these signals at scale is a significant challenge. The homefeed ranking team focuses on the machine learning model that predicts how likely a user will interact with a certain piece of content, as well as leveraging those individual prediction scores for holistic optimization to present users with a feed of diverse content.

What you’ll do:

  • Technical lead and engineering manager for the Homefeed Ranking team in San Francisco
  • Help drive technical strategy and longer term vision for machine learning and recommendation at Pinterest
  • Lead a senior team of 10 Machine Learning engineers
  • Hands-on role, spending 60% time on technical leadership/IC work and 40% time on people management
  • Use machine learning / deep learning techniques to solve of the most large scale recommendation problems in the industry
  • Collaborate with partner teams like product, data science, business, ads

What we’re looking for:

  • Graduate degree plus 5+ years of industry experience 
  • Technical lead experience and some engineering management experience 
  • Strong machine learning background within ranking, recommendations, optimization or similar ML problems

#LI-TG1

Verified by
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like
3 Ways to Run Kubernetes on AWS
Simplifying Web Deploys
The Business Case for Container Adoption
Best Practices for Short-term and Permanent Flags