Powering Inclusive Search & Recommendations with Our New Visual Skin Tone Model

288
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Nadia Fawaz | Research Scientist & Tech Lead, Applied Science, Bhawna Juneja | Software Engineer, Search Quality, David Xue | Software Engineer, Visual Search


To truly bring everyone the inspiration to create a life they love, Pinterest is committed to content diversity and to developing inclusive search and recommendation engines. A top request we hear from Pinners is that they want to feel represented in the product, which is why we built our first version of skin tone ranges, an inclusive search feature, in 2018. We’re proud to introduce the latest version of skin tone ranges, a newly built in-house technology. These new skin tone ranges are paving the way for more inclusive inspirations to be recommended in search, as well as in our augmented reality technology, Try on, and are driving initiatives for more diverse recommendations across the platform.

Skin tone ranges in Beauty Search and in AR Try-on Similar Looks

Developing more inclusive skin tone ranges

Trying to understand the skin tone range in an image is a complex challenge for computer vision systems, given the impact of shadows, different lighting, and a variety of other impediments. Developing inclusive skin tone ranges required an end-to-end iterative process to build, evaluate and improve performance over several versions. While qualitative evaluation could help reveal issues, in order to make progress, we needed to measure performance gaps across skin tone ranges and understand the error patterns for each range.

A variety of lighting conditions

Starting with diverse data

We labeled a diverse set of beauty images covering a wide range of skin tones to evaluate the system performance during development. Measuring performance is important to assess progress, however coarse aggregate metrics over the entire data, such as accuracy, are not sufficient, as the aggregation may hide performance discrepancies between skin tone ranges. To quantify performance biases, we went beyond overall aggregates and computed granular metrics per skin tone range, including precision, recall, and F1-score. Per range metrics would show if errors disproportionately affected some ranges. We also used confusion matrices to analyze error patterns for each range. The matrices would reveal if a model failed to predict a skin tone for images in a range, leading to a very low recall and F1-score for that range, or if it failed to distinguish images from different ranges and misclassified them, impacting recall and precision for several ranges, as in the examples below.

Examples of issues

To understand the root-causes of issues, we performed an error analysis of the components of the skin tone system based on their output. At a high level, a skin tone system may include

  • a detection model that attempts to determine the presence and location of a face in a beauty image, but does not attempt to recognize an individual person’s face
  • a color extraction module
  • a scorer and thresholder to estimate the skin tone range

Analyzing the score distributions per skin tone range over the diverse dataset can show if the score distributions are separable or if they overlap, and if the thresholds are out-of-phase with the diverse data, as in the example above. Both issues can be amplified by color extraction failures in challenging lighting conditions. Studying face detection errors can reveal if the model fails to detect faces in beauty images with a darker skin tone at significantly higher rates than in images with lighter skin tones, which would preclude the system from generating a skin tone range for these images. This type of bias in face detection models can carry over to the skin tone system, and no amount of downstream post-processing for fairness on the output of the system can correct such upstream bias. Biases in face detection have been analyzed previously in the Gender Shades study by Joy Buolamwini and Timnit Gebru. Requiring face detection to predict skin tone also limits the scope of the system, as it cannot handle images of other body parts such as manicured hands, and it contributes to the overall system latency and scalability challenges.

Through analysis, we reached the conclusion that to improve fairness in performance across all skin tone ranges, we needed to build an end-to-end system with bias mitigation.

Developing new skin tone ranges by mitigating biases

Visual skin tone ranges V1: Mitigating bias

We developed the new visual skin tone v1 ranges based on visual input and focused on:

  • mitigating biases to make skin tone perform outstandingly well across all ranges
  • creating a signal that doesn’t require the presence of a full front-facing face, but also works for partial faces or other body parts
  • extending to applications beyond beauty, such as fashion
  • leveraging this more reliable signal as a building block to improve fairness and reduce potential bias in other ML models

The visual skin tone v1 leverages several computer vision techniques to estimate the skin tone range in a beauty image. After exposure correction, a face detection model identifies the face area and landmarks corresponding to facial features such as eyes, eyebrows, nose, mouth and face edge. This face detection model has better coverage on images with darker skin tones. Some facial features, such as eyes and lips, are then cropped out, and binary erosion is applied to remove hair and edge noise and finally produce a face skin mask. If face detection fails to identify a face in the image, for example in images of other body parts, Hue Saturation Value (HSV) processing attempts to locate skin pixels and produces a skin mask. The color extraction module then estimates a dominant color based on the RGB distribution of the skin mask pixels. The dominant color is converted to the LAB space, and the individual topology angle (ITA) is computed as a nonlinear function of L and B coordinates. The resulting ITA scores are more separable across ranges. Using a diverse dataset of images, fairness aware tuning is performed on the ITA scores to produce a skin tone prediction while mitigating biases in performance between ranges.

Evaluation of the visual skin tone v1 on the diverse set of beauty Pins showed ~3x higher accuracy on the predicted skin tone. Moreover, per range precision, recall and F1-score metrics increased for all ranges. We observed ~10x higher recall and ~6x higher F1-score on darker skin tones. The new model reduced biases in performance across skin tone ranges, and led to a major increase in coverage of skin tone ranges for billions of images in our beauty, women’s and men’s fashion corpora.

Beyond offline evaluation, having humans in the loop can significantly improve performance by integrating feedback from human evaluation, users and communities. For instance, we conducted several rounds of qualitative review and annotation of the skin tone inference results on diverse images to identify new error patterns and inform training data collection and modeling choices, as we iterated on the model. We also leveraged side-by-side comparisons of results in inclusive bug bashes with a diverse group of participants. Regular quantitative and qualitative evaluations help improve quality over time. In production, we ran experiments to evaluate the new skin tone v1, and built dashboards to monitor the diversity of content served.

Visual skin tone ranges V2: Keep learning

While iterating on skin tone v1, we first focused on getting the simpler cases right, such as front-facing faces in beauty portrait images. As we later expanded to the broader cases of rotated faces, different lighting conditions, occlusions such as facial hair, sunglasses, face masks, other body parts, and integrated more images from diverse communities, we learned from the errors of skin tone v1 to develop a more robust skin tone v2. We worked closely with designers to iterate and develop clear labeling guidelines for tens of thousands of images. Iterating on the model and the collection of its training and evaluation data by actively integrating learnings from earlier versions allowed the model to improve over time. This helped expand its application beyond beauty images to the broader context of fashion.

The need to handle more complex images led us to move away from face detection, and to take a new approach for skin tone v2 based on an end-to-end CNN model from the raw images. We first trained a ResNet model to learn skin tone from a more diverse set of images from beauty and fashion, including v1 error cases. This model outperformed v1 when evaluated on larger, more challenging data. We then considered adding skin tone prediction as a new jointly trained head in the multi-task Unified Embedding model. This approach led to further performance improvements, but at the cost of increased complexity and of coupling with the multi-head development and release schedule. Eventually, we used the 2048-dimensional binarized Unified Embedding as input to a multilayer perceptron (MLP), trained using dropout and a softmax with cross-entropy loss to predict skin tone ranges. This led to significant performance enhancements for all ranges, benefiting from the information captured in our existing embedding while requiring far less computation.

Productionizing visual skin tone at scale

To productionize skin tone v1 for billions of beauty and fashion images, we first identified which Pin images were relevant for skin tone prediction. We leveraged several Pinterest signals, such as Pin2Interest to gather beauty and fashion content and our embedding-based visual Image Style and Shopping Style signals, to filter out irrelevant Pins, like product images, which helped with scale and precision by narrowing the image corpus.

To generate skin tone ranges for existing and new images for skin tone v1, we used our GPU-enabled C++ service for image-based models, that supports both real-time online extraction and offline extraction in two stages — an ad hoc backfill and a scheduled incremental workflow.

For visual skin tone v2, our embedding-based feature extractor utilizes pre-computed unified visual embedding as input features to the MLP. This approach uses Spark and CPU Hadoop clusters to significantly speed up skin tone classification in a cost-effective manner. Without having to process the image pixels, our embedding-based approach reduces the time needed to compute the backfill for billions of Pin images from nearly a week to under an hour.

Applications

Improving skin tone ranges in search for global audiences

Skin tone ranges provide Pinners the option to filter beauty results by a skin tone range of their choice, represented by four palettes. The improved skin tone models gave us the confidence to make skin tone ranges more prominent in the product and launch internationally in search.

Deploying the new skin tone v1 for beauty search queries first required indexing the skin tone signal as a discrete feature among four ranges and the prediction method — face detection or HSV processing. To evaluate skin tone v1 in search, we first gathered qualitative feedback from a diverse set of internal participants and then launched an experiment to assess the online performance at scale. The internal evaluation and the experiment analysis showed a clear improvement in precision and recall for the new model. The model was more accurate at classifying pin images into their respective skin tone ranges, especially the darker ranges, leading to large gains in precision and coverage in search results. We also noticed that skin tone range adoption rates in English speaking countries were comparable to the U.S., and both increased with the combined launches of the redesigned skin tone range UI and the new skin tone range model.

Skin tone ranges in similar looks for AR Try on

Try on was developed with inclusion in mind at the outset of Pinterest AR, supported by visual skin tone v1. The Similar Looks module in the AR Try on for lipstick experience allows users to discover makeup looks with similar lip styles. By integrating skin tone ranges in Similar Looks, users can filter inspiration looks by a skin tone range of their choice.

To build Similar Looks, the makeup parameters of a beauty pin are estimated by DNN models trained on a high quality, human-curated diverse set of tens of thousands of beauty images spanning a wide range of skin tones. First, an embedding-based DNN classifier for the Try-On Taxonomy of Image Style is trained with PyTorch using the Unified Embedding as input. Lipstick parameter extraction is performed using a cascade consisting of a face detector, landmark detector, and DNN-based parameter regressor. The visual skin tone v1 is indexed and combined with a lightweight approach to retrieve Makeup Look pins in the selected skin tone range with lipstick parameters most similar to the color of the query makeup product in perceptual color space. Together these components form a new kind of visual discovery experience for makeup try-on, connecting individual products to an inspirational and diverse set of beauty Pins.

Content diversity understanding and diversification

Leveraging diversity signals such as skin tone helps us analyze and understand the diversity of our content, as well as how it is surfaced and engaged with. With skin tone v1, we quadrupled our skin tone range coverage of beauty and fashion content. [Source: Pinterest Internal Data, April 2020] Our skin tone signal is now 3x as likely to detect multiple skin tone ranges in the top search results [Pinterest Internal data, July 2020], allowing more accurate measurements of the diversity of content served. Such analysis can help inform work around diversification of content inventory and its distribution on Pinterest.

The road ahead

Through our experience developing skin tone ranges and integrating them in our search and AR Try on products, we learned the importance of building ML systems with inclusion by design and respect for user privacy at the heart of technical choices. In a multi-disciplinary collaboration between engineering and teams spanning many organizations, we are building on this foundation to further improve skin tone ranges, develop diversity signals, diversify search results and recommendations in various surfaces, and expand the inclusive product experience to more content and domains globally.

Acknowledgments

This work is the result of a cross-functional collaboration between many teams. Many thanks to Josh Beal, Laksh Bhasin, Lulu Cheng, Nadia Fawaz, Angela Guo, Edmarc Hedrick, Emma Herold, Ryan James, Nancy Jeng, Bhawna Juneja, Dmitry Kislyuk, Molly Marriner, Candice Morgan, Monica Pangilinan, Seth Dong Huk Park, Zhdan Philippov, Rajat Raina, Chuck Rosenberg, Marta Scotto, Annie Ta, Michael Tran, Eric Tzeng, David Xue.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Senior Engineering Manager, Homefeed ...
San Francisco, CA

About Pinterest:

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

Homefeed is a discovery platform at Pinterest that helps users find and explore their personal interests. We work with some of the largest datasets in the world, tailoring over billions of unique content to 330M+ users. Our content ranges across all categories like home decor, fashion, food, DIY, technology, travel, automotive, and much more. Our dataset is rich with textual and visual content and has nice graph properties — harnessing these signals at scale is a significant challenge. The homefeed ranking team focuses on the machine learning model that predicts how likely a user will interact with a certain piece of content, as well as leveraging those individual prediction scores for holistic optimization to present users with a feed of diverse content.

What you’ll do:

  • Technical lead and engineering manager for the Homefeed Ranking team in San Francisco
  • Help drive technical strategy and longer term vision for machine learning and recommendation at Pinterest
  • Lead a senior team of 10 Machine Learning engineers
  • Hands-on role, spending 60% time on technical leadership/IC work and 40% time on people management
  • Use machine learning / deep learning techniques to solve of the most large scale recommendation problems in the industry
  • Collaborate with partner teams like product, data science, business, ads

What we’re looking for:

  • Graduate degree plus 5+ years of industry experience 
  • Technical lead experience and some engineering management experience 
  • Strong machine learning background within ranking, recommendations, optimization or similar ML problems

#LI-EA2

Senior Staff Machine Learning Enginee...
San Francisco, CA

About Pinterest:

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

Homefeed is a discovery platform at Pinterest that helps users find and explore their personal interests. We work with some of the largest datasets in the world, tailoring over billions of unique content to 330M+ users. Our content ranges across all categories like home decor, fashion, food, DIY, technology, travel, automotive, and much more. Our dataset is rich with textual and visual content and has nice graph properties — harnessing these signals at scale is a significant challenge. The Homefeed ranking team focuses on the machine learning model that predicts how likely a user will interact with a certain piece of content, as well as leveraging those individual prediction scores for holistic optimization to present users with a feed of diverse content.

What you’ll do:

  • Work on state-of-the-art large-scale applied machine learning projects
  • Improve relevance and the user experience on Homefeed
  • Re-architect our deep learning models to improve their capacity and enable more use cases
  • Collaborate with other teams to build/incorporate various signals to machine learning models
  • Collaborate with other teams to extend our machine learning based solutions to other use cases

What we’re looking for:

  • Passionate about applied machine learning and deep learning
  • 8+ years experience applying machine learning methods in settings like recommender systems, search, user modeling, image recognition, graph representation learning, natural language processing

#L1-EA2

Strategic Finance Associate, Infrastr...
San Francisco, CA

About Pinterest:

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person in your field, all the while helping users make their lives better in the positive corner of the internet. 

As a Strategic Finance Associate, you’ll ensure that we’re using our capital efficiently and are on track to achieve our long-term financial goals. You’ll measure financial performance, report utilization and operational results, lead our infrastructure capacity planning process, and perform strategic ad-hoc analysis as a CFO delegate for our Infrastructure organization. 

What you’ll do:

  • Partner with Engineering teams to develop annual, monthly, and weekly asset demand and financial forecasts
  • Compile monthly, quarterly, and annual variance analysis
  • Proactively identify risks and opportunities
  • Perform strategic, time-sensitive analysis and communicate key takeaways
  • Identify and drive initiatives to continuously improve, automate, and grow infrastructure  forecasting, and reporting tools and processes
  • Design usage prediction models for short and long-term resource capacity needs
  • Assist with ad-hoc strategic and operational projects as required

What we’re looking for:

  • 4+ years of finance or business operational experience in a fast-paced industry 
  • Experience in managing multiple work-streams end-to-end along with working across multiple functions and teams
  • Expertise in building high-level and detailed financial models
  • Outstanding Excel, Tableau, and written communication skills
  • Past exposure to basic SQL, willingness to learn advanced SQL
  • Ability to multi-task and deliver against timelines
  • Entrepreneurial, flexible, and collaborative

#LI-SJ2

 

EPM Lead Developer, Adaptive Planning...
San Francisco, CA

About Pinterest: 

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. As a Pinterest employee, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping users make their lives better in the positive corner of the internet.

The EPM technology team at Pinterest is looking for a senior EPM architect who has at least four years of technical experience in Workday Adaptive Planning. You will be the solutions architect who oversees technical design of the complete EPM ecosystem with emphasis on Adaptive Financial and Workforce planning. The right candidate will also need to have hands-on development experience with Adaptive Planning and related technologies. The role is in IT but will work very closely with FP&A and the greater Finance/Accounting teams. Experience with Tableau suite of tools is a plus.

What you'll do: 

  • Together with the EPM Technology team, you will own Adaptive Planning and all related services
  • Oversee architecture of existing Adaptive Planning solution and make suggestions for improvements
  • Solution and lead Adaptive Planning enhancement projects from beginning to end
  • Help EPM Technology team gain deeper understanding of Adaptive Planning and train the team on Adaptive Planning best practices
  • Establish strong relationship with Finance users and leadership to drive EPM roadmap for Adaptive Planning and related technologies
  • Help establish EPM Center of Excellence at Pinterest

What we're looking for: 

  • Hands-on design and build experience with all Adaptive Planning technologies: standard sheets, cube sheets, all dimensions, reporting, integration framework, security, dashboarding and OfficeConnect
  • Strong in application design, data integration and application project lifecycle
  • Comfortable working side-by-side with business
  • Ability to translate business requirements to technical requirements
  • Strong understanding in all three financial statements and the different enterprise planning cycles
  • Familiar with Tableau suite of tools

 

Verified by
Security Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like