Powering Inclusive Search & Recommendations with Our New Visual Skin Tone Model

561
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Nadia Fawaz | Research Scientist & Tech Lead, Applied Science, Bhawna Juneja | Software Engineer, Search Quality, David Xue | Software Engineer, Visual Search


To truly bring everyone the inspiration to create a life they love, Pinterest is committed to content diversity and to developing inclusive search and recommendation engines. A top request we hear from Pinners is that they want to feel represented in the product, which is why we built our first version of skin tone ranges, an inclusive search feature, in 2018. We’re proud to introduce the latest version of skin tone ranges, a newly built in-house technology. These new skin tone ranges are paving the way for more inclusive inspirations to be recommended in search, as well as in our augmented reality technology, Try on, and are driving initiatives for more diverse recommendations across the platform.

Skin tone ranges in Beauty Search and in AR Try-on Similar Looks

Developing more inclusive skin tone ranges

Trying to understand the skin tone range in an image is a complex challenge for computer vision systems, given the impact of shadows, different lighting, and a variety of other impediments. Developing inclusive skin tone ranges required an end-to-end iterative process to build, evaluate and improve performance over several versions. While qualitative evaluation could help reveal issues, in order to make progress, we needed to measure performance gaps across skin tone ranges and understand the error patterns for each range.

A variety of lighting conditions

Starting with diverse data

We labeled a diverse set of beauty images covering a wide range of skin tones to evaluate the system performance during development. Measuring performance is important to assess progress, however coarse aggregate metrics over the entire data, such as accuracy, are not sufficient, as the aggregation may hide performance discrepancies between skin tone ranges. To quantify performance biases, we went beyond overall aggregates and computed granular metrics per skin tone range, including precision, recall, and F1-score. Per range metrics would show if errors disproportionately affected some ranges. We also used confusion matrices to analyze error patterns for each range. The matrices would reveal if a model failed to predict a skin tone for images in a range, leading to a very low recall and F1-score for that range, or if it failed to distinguish images from different ranges and misclassified them, impacting recall and precision for several ranges, as in the examples below.

Examples of issues

To understand the root-causes of issues, we performed an error analysis of the components of the skin tone system based on their output. At a high level, a skin tone system may include

  • a detection model that attempts to determine the presence and location of a face in a beauty image, but does not attempt to recognize an individual person’s face
  • a color extraction module
  • a scorer and thresholder to estimate the skin tone range

Analyzing the score distributions per skin tone range over the diverse dataset can show if the score distributions are separable or if they overlap, and if the thresholds are out-of-phase with the diverse data, as in the example above. Both issues can be amplified by color extraction failures in challenging lighting conditions. Studying face detection errors can reveal if the model fails to detect faces in beauty images with a darker skin tone at significantly higher rates than in images with lighter skin tones, which would preclude the system from generating a skin tone range for these images. This type of bias in face detection models can carry over to the skin tone system, and no amount of downstream post-processing for fairness on the output of the system can correct such upstream bias. Biases in face detection have been analyzed previously in the Gender Shades study by Joy Buolamwini and Timnit Gebru. Requiring face detection to predict skin tone also limits the scope of the system, as it cannot handle images of other body parts such as manicured hands, and it contributes to the overall system latency and scalability challenges.

Through analysis, we reached the conclusion that to improve fairness in performance across all skin tone ranges, we needed to build an end-to-end system with bias mitigation.

Developing new skin tone ranges by mitigating biases

Visual skin tone ranges V1: Mitigating bias

We developed the new visual skin tone v1 ranges based on visual input and focused on:

  • mitigating biases to make skin tone perform outstandingly well across all ranges
  • creating a signal that doesn’t require the presence of a full front-facing face, but also works for partial faces or other body parts
  • extending to applications beyond beauty, such as fashion
  • leveraging this more reliable signal as a building block to improve fairness and reduce potential bias in other ML models

The visual skin tone v1 leverages several computer vision techniques to estimate the skin tone range in a beauty image. After exposure correction, a face detection model identifies the face area and landmarks corresponding to facial features such as eyes, eyebrows, nose, mouth and face edge. This face detection model has better coverage on images with darker skin tones. Some facial features, such as eyes and lips, are then cropped out, and binary erosion is applied to remove hair and edge noise and finally produce a face skin mask. If face detection fails to identify a face in the image, for example in images of other body parts, Hue Saturation Value (HSV) processing attempts to locate skin pixels and produces a skin mask. The color extraction module then estimates a dominant color based on the RGB distribution of the skin mask pixels. The dominant color is converted to the LAB space, and the individual topology angle (ITA) is computed as a nonlinear function of L and B coordinates. The resulting ITA scores are more separable across ranges. Using a diverse dataset of images, fairness aware tuning is performed on the ITA scores to produce a skin tone prediction while mitigating biases in performance between ranges.

Evaluation of the visual skin tone v1 on the diverse set of beauty Pins showed ~3x higher accuracy on the predicted skin tone. Moreover, per range precision, recall and F1-score metrics increased for all ranges. We observed ~10x higher recall and ~6x higher F1-score on darker skin tones. The new model reduced biases in performance across skin tone ranges, and led to a major increase in coverage of skin tone ranges for billions of images in our beauty, women’s and men’s fashion corpora.

Beyond offline evaluation, having humans in the loop can significantly improve performance by integrating feedback from human evaluation, users and communities. For instance, we conducted several rounds of qualitative review and annotation of the skin tone inference results on diverse images to identify new error patterns and inform training data collection and modeling choices, as we iterated on the model. We also leveraged side-by-side comparisons of results in inclusive bug bashes with a diverse group of participants. Regular quantitative and qualitative evaluations help improve quality over time. In production, we ran experiments to evaluate the new skin tone v1, and built dashboards to monitor the diversity of content served.

Visual skin tone ranges V2: Keep learning

While iterating on skin tone v1, we first focused on getting the simpler cases right, such as front-facing faces in beauty portrait images. As we later expanded to the broader cases of rotated faces, different lighting conditions, occlusions such as facial hair, sunglasses, face masks, other body parts, and integrated more images from diverse communities, we learned from the errors of skin tone v1 to develop a more robust skin tone v2. We worked closely with designers to iterate and develop clear labeling guidelines for tens of thousands of images. Iterating on the model and the collection of its training and evaluation data by actively integrating learnings from earlier versions allowed the model to improve over time. This helped expand its application beyond beauty images to the broader context of fashion.

The need to handle more complex images led us to move away from face detection, and to take a new approach for skin tone v2 based on an end-to-end CNN model from the raw images. We first trained a ResNet model to learn skin tone from a more diverse set of images from beauty and fashion, including v1 error cases. This model outperformed v1 when evaluated on larger, more challenging data. We then considered adding skin tone prediction as a new jointly trained head in the multi-task Unified Embedding model. This approach led to further performance improvements, but at the cost of increased complexity and of coupling with the multi-head development and release schedule. Eventually, we used the 2048-dimensional binarized Unified Embedding as input to a multilayer perceptron (MLP), trained using dropout and a softmax with cross-entropy loss to predict skin tone ranges. This led to significant performance enhancements for all ranges, benefiting from the information captured in our existing embedding while requiring far less computation.

Productionizing visual skin tone at scale

To productionize skin tone v1 for billions of beauty and fashion images, we first identified which Pin images were relevant for skin tone prediction. We leveraged several Pinterest signals, such as Pin2Interest to gather beauty and fashion content and our embedding-based visual Image Style and Shopping Style signals, to filter out irrelevant Pins, like product images, which helped with scale and precision by narrowing the image corpus.

To generate skin tone ranges for existing and new images for skin tone v1, we used our GPU-enabled C++ service for image-based models, that supports both real-time online extraction and offline extraction in two stages — an ad hoc backfill and a scheduled incremental workflow.

For visual skin tone v2, our embedding-based feature extractor utilizes pre-computed unified visual embedding as input features to the MLP. This approach uses Spark and CPU Hadoop clusters to significantly speed up skin tone classification in a cost-effective manner. Without having to process the image pixels, our embedding-based approach reduces the time needed to compute the backfill for billions of Pin images from nearly a week to under an hour.

Applications

Improving skin tone ranges in search for global audiences

Skin tone ranges provide Pinners the option to filter beauty results by a skin tone range of their choice, represented by four palettes. The improved skin tone models gave us the confidence to make skin tone ranges more prominent in the product and launch internationally in search.

Deploying the new skin tone v1 for beauty search queries first required indexing the skin tone signal as a discrete feature among four ranges and the prediction method — face detection or HSV processing. To evaluate skin tone v1 in search, we first gathered qualitative feedback from a diverse set of internal participants and then launched an experiment to assess the online performance at scale. The internal evaluation and the experiment analysis showed a clear improvement in precision and recall for the new model. The model was more accurate at classifying pin images into their respective skin tone ranges, especially the darker ranges, leading to large gains in precision and coverage in search results. We also noticed that skin tone range adoption rates in English speaking countries were comparable to the U.S., and both increased with the combined launches of the redesigned skin tone range UI and the new skin tone range model.

Skin tone ranges in similar looks for AR Try on

Try on was developed with inclusion in mind at the outset of Pinterest AR, supported by visual skin tone v1. The Similar Looks module in the AR Try on for lipstick experience allows users to discover makeup looks with similar lip styles. By integrating skin tone ranges in Similar Looks, users can filter inspiration looks by a skin tone range of their choice.

To build Similar Looks, the makeup parameters of a beauty pin are estimated by DNN models trained on a high quality, human-curated diverse set of tens of thousands of beauty images spanning a wide range of skin tones. First, an embedding-based DNN classifier for the Try-On Taxonomy of Image Style is trained with PyTorch using the Unified Embedding as input. Lipstick parameter extraction is performed using a cascade consisting of a face detector, landmark detector, and DNN-based parameter regressor. The visual skin tone v1 is indexed and combined with a lightweight approach to retrieve Makeup Look pins in the selected skin tone range with lipstick parameters most similar to the color of the query makeup product in perceptual color space. Together these components form a new kind of visual discovery experience for makeup try-on, connecting individual products to an inspirational and diverse set of beauty Pins.

Content diversity understanding and diversification

Leveraging diversity signals such as skin tone helps us analyze and understand the diversity of our content, as well as how it is surfaced and engaged with. With skin tone v1, we quadrupled our skin tone range coverage of beauty and fashion content. [Source: Pinterest Internal Data, April 2020] Our skin tone signal is now 3x as likely to detect multiple skin tone ranges in the top search results [Pinterest Internal data, July 2020], allowing more accurate measurements of the diversity of content served. Such analysis can help inform work around diversification of content inventory and its distribution on Pinterest.

The road ahead

Through our experience developing skin tone ranges and integrating them in our search and AR Try on products, we learned the importance of building ML systems with inclusion by design and respect for user privacy at the heart of technical choices. In a multi-disciplinary collaboration between engineering and teams spanning many organizations, we are building on this foundation to further improve skin tone ranges, develop diversity signals, diversify search results and recommendations in various surfaces, and expand the inclusive product experience to more content and domains globally.

Acknowledgments

This work is the result of a cross-functional collaboration between many teams. Many thanks to Josh Beal, Laksh Bhasin, Lulu Cheng, Nadia Fawaz, Angela Guo, Edmarc Hedrick, Emma Herold, Ryan James, Nancy Jeng, Bhawna Juneja, Dmitry Kislyuk, Molly Marriner, Candice Morgan, Monica Pangilinan, Seth Dong Huk Park, Zhdan Philippov, Rajat Raina, Chuck Rosenberg, Marta Scotto, Annie Ta, Michael Tran, Eric Tzeng, David Xue.

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Engineering Manager, Shopping Content...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is aiming to build a world-class shopping experience for our users, and has a unique advantage to succeed due to the high shopping intent of Pinners. The new Shopping Content Mining team being founded in Toronto plays a critical role in this journey. This team is responsible for building a brand new platform for mining and understanding product data, including extracting high quality product attributes from web pages and free texts that come from all major retailers across the world, mining product reviews and product relationships, product classification, etc. The rich product data generated by this platform is the foundation of the unified product catalog, which powers all shopping experiences at Pinterest (e.g., product search & recommendations, product detail page, shop the look, shopping ads).

There are unique technical challenges for this team: building large scale systems that can process billions of products, Machine Learning models that require few training examples to generate wrappers for web pages, NLP models that can extract information from free-texts, easy-to-use human labelling tools that generate high quality labeled data.Your work will have a huge impact on improving the shopping experience of 400M+ Pinners and driving revenue growth for Pinterest.

What you’ll do:

  • As the Engineering Manager, you’ll be responsible for:
    • Growing this team further in Toronto
    • Driving execution and deliver impact
    • Setting long term technical visions for this area
  • Work with tech leads to provide technical guidance on:
    • Large scale systems that can process billions of products
    • ML models for wrapper induction that require few training examples, NLP models for understanding free-texts
  • Drive cross functional collaborations with partner teams working on shopping

What we’re looking for:

  • 7+ years of industry experience, including 2+ years of management experience
  • Experience on large scale machine learning systems (full ML stack from modelling to deployment at scale.)
  • Experience with big data technologies (e.g., Hadoop/Spark) and scalable realtime systems that process stream data

Nice to have:

  • PhD in Machine Learning or related areas, publication on top ML conferences
  • Familiarity with information extraction techniques for web-pages and free-texts.
  • Experience working with shopping data is a plus.
  • Experience building internal tools for labeling / diagnosing.

#LI-EA1

Staff Machine Learning Software Engin...
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Shopping is at the core of Pinterest’s mission to help people create a life they love. The shopping discovery team at Pinterest is inventing a brand new, more visual and personalized shopping experience for 350M+ users worldwide. The team is responsible for delivering mid-funnel shopping experience on shopping surfaces like Product Detail Page, Shopping Search, Shopping on Board etc. As an engineer of the team you will be working on the most cutting edge recommendation algorithms to develop diverse types of shopping recommendations that will be displayed across different shopping surfaces on Pinterest. 

You’ll also be responsible for optimizing the whole page layout by appropriately selecting and slotting the UI templates and recommendation modules optimizing towards a shopping metric. As an engineer of the team you’ll be running experiments and directly improving the shopping metrics contributing to the bottom line of the company.

If you are excited about large scale machine learning problems in the area of recommendation, search and whole page optimization then you must consider this role

What you'll do: 

  • Develop large scale shopping recommendation algorithms
  • Build data pipelines to do data analysis and collect training data
  • Train deep learning models to improve quality and engagement of shopping recommenders
  • Work on backend and infrastructure to build, deploy and serve machine learning models
  • Develop algorithms to optimize the whole page layout of the shopping surfaces
  • Drive the roadmap for next generation of shopping recommenders

What we're looking for: 

  • 6+ years working experience in the area of applied Machine Learning
  • Interest or experience working on a large-scale search, recommendation and ranking problems
  • Interest and experience in doing full stack ML, including backend and ML infrastructure
  • Experience is any of the following areas
    • Developing large scale recommender systems
    • Contextual bandit algorithms
    • Reinforcement learning

#LI-JY1

Software Engineer, Sales Tools
Toronto, ON, CA

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The amount of advertisers on Pinterest is growing faster than the sales team, necessitating new investments in driving sales productivity through Tooling. Your customers would be internal, and you’d become an expert at how the whole sales motion works. Friction is your enemy, and happy productive sales people is your outcome. We’re looking for motivated and self starting individuals to evolve existing, and build new tooling from scratch. You’ll work closely with internal customers from many disciplines (product, operations, sales) to rapidly deliver creative solutions in an iterative manner. 

What you’ll do:

  • Design and develop internal tools to improve efficiency of sales teams and processes
  • Architect, deploy and maintain performant and reliable systems, building for quick iteration and reusability
  • Work closely with internal customers from product management, sales and operations to craft fit for purpose tools
  • Re-think how current processes can be made better through data enrichment, connecting systems and automation
  • Define new metrics and systems for observing, evaluating and further optimizing business processes

What we’re looking for:

  • 3+ years of software engineering experience
  • Experiences in developing backend large scale services and data processing workflows in Java
  • Experience utilizing big data processing systems such as Hive/Spark/Presto. 
  • Strong developer. Loves coding and constructing technical solutions
  • Effective collaboration with other teams

#LI-GK1

Software Engineer, Pinterest Labs – I...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

As a Software Engineer in Pinterest Labs, you'll work on tackling new challenges in machine learning and deep learning applied to a unique Pinterest dataset of 250 billion pins. You'll work on critical machine learning applications, push the state of the art, and build models and systems that are applied across Pinterest engineering teams to be used by hundreds of millions of users at tens to hundreds of thousands of QPS. You'll have the opportunity to work in the following areas: ML fairness, representation learning, graph embeddings, image recognition, user modeling, search and recommender systems, and natural language processing. 

The goal of Inclusive AI is to develop AI systems that perform outstandingly well across our diverse set of users and our wide range of applications. You will advance the state of the art in AI fairness, performing applied research in algorithmic bias, fairness and diversity for search and recommendation systems, computer vision models, representation learning, and more

What you’ll do: 

  • Advance the state-of-the-art in AI Fairness for large scale AI systems, including applied research in algorithmic bias, and diversity for search and recommendation systems
  • Develop ML models and deploy in large-scale distributed ML systems to enable inclusive and diverse recommendations at scale.
  • Work in a fast-paced environment with a quick cadence of research, experimentation, and product launches
  • Impact hundreds of millions of users by developing the next generation of inclusive visual discovery technology

What we’re looking for: 

  • Passionate about AI fairness, diversity, machine learning, and search and recommendation systems
  • PhD, or Masters degree with industry experience, in a technical field (EECS, Stats, Engineering, Maths)
  • Inquisitive engineer with 2+ years of industry experience in Search and Recommendation systems; preferably, but not required to be, related to algorithmic bias, AI fairness, and/or diversity
  • Ability to collaborate with multiple engineering, product and non-technical teams in a cross-functional environment
  • Python, Java programming experience
  • Tensorflow OR PyTorch experience
  • Experience with large scale data processing (e.g. Spark)
  • Industry experience in deploying ML/DL models into production (familiarity with scalability/latency/portability concerns, experience with experimentation and hyperparameter tuning)
  • Strong passion for experimentation and extensive experience in solving hard ML problems

#LI-TG1

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like