How Pinterest Fights Spam Using Machine Learning

649
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Vishwakarma Singh | Trust and Safety Machine Learning Lead


Hundreds of millions of people regularly visit Pinterest to visually discover inspiring ideas among billions of Pins. Inspiration is a high bar and we must be vigilant in ensuring that Pinners don’t see spam, harmful content or misinformation. To enforce our community policies and maintain an inspiring environment, we use the latest in machine learning technology to build automated systems that swiftly detect and act against both spammy content and spammers.

Our anti-spam system consists of both reactive and proactive components to effectively counter adversarial abusers — users who intentionally try to evade the system. Our proactive system consists of sophisticated machine learning models, whereas the reactive system includes both rules executed in a real-time rules engine and lightweight machine learning models. We not only use the latest modeling techniques but also iterate on these models at regular intervals by adding new data and exploring new technical breakthroughs to either maintain or improve their performance over time to effectively address spam.

One tactic malicious actors enact is misusing a Pin’s image and linking to a malicious external website. Our models detect spam vectors, like Pin links, as well as users engaging in spammy behaviors. We quickly limit distribution of Pins with spam links and take direct action against users identified with a high confidence to be engaging in spammy behavior. We perform a manual review for those identified with low confidence to limit false positives, and we notify users of our actions to maintain transparency and also provide an option of appeal against our decision.

Machine Learning Models

Spam Domain Model

We proactively identify spam Pin links using a Deep Neural Network classifier (shown in Figure 1). To maximize impact, our model learns to classify a domain as spam rather than a link. We apply the same enforcement to all Pins with links belonging to the same domain. This model is trained interactively on manually labeled domains to achieve a higher recall and lower false positive rate. We use features created from links, web page text and media, user-domain interactions, and user behavior as inputs. For each domain, we sample links and webpages to create features. We semantically split links into semantic tokens and use only frequent tokens as features. We analyze outlying patterns in user actions over time to create behavioral features. This model is periodically batch inferred at scale by a PySpark job using Tensorflow, Spark SQL, and a UDF.

Figure 1. Deep Neural Network for domain classification

Spam User Model

Identifying users engaging in spam activities is the ultimate solution for fighting spam, but it is extremely hard to achieve. We leverage both supervised and unsupervised models to build an effective spam user identification system.

Classification Model

Our spam user classification model is a Deep Neural Network (shown in Figure 2) and is part of our proactive system. It is trained using synthetically labeled data generated with minimal human supervision to ensure quality. We use features created from user attributes and their past behaviors as inputs. We also use user-domain interaction, summarized as a domain scores distribution for each user where domain scores are reused from the spam domain model, as an input. This model is periodically batch inferred to score millions of Pinners by a PySpark job using Tensorflow, Spark SQL, and a UDF.

Figure 2. Deep Neural Network for user classification

Clustering

We have developed lightweight clustering models for early detection of suspicious users and bots. This technique also addresses gaps in our classification models, which are unaware of emerging patterns unless re-trained with fresh labeled data. We cluster users on attributes which can successfully isolate suspicious groups with high accuracy. Experts identify these attributes by exploring the behavior of suspicious users and their use of resources for creating spammy content. This model is implemented using PySpark and SparkSQL and executes daily.

Spam User-Domain Model

Interactions of users with domains are explicitly captured by a heterogeneous bipartite graph as shown in Figure 3. We represent users and domains as nodes in the graph and create an edge between a user and a domain if the user has created or saved a Pin with the domain’s link. This graph facilitates simultaneous identification of spam users and domains using a semi-supervised learning. We use a small set of labeled users and domains to run a label propagation algorithm and learn scores for the unlabeled users and domains. We implement this iterative algorithm in Spark and run it periodically.

Figure 3. Bipartite graph of users and domains for label propagation

Measurement

We measure spam prevalence on Pinterest by computing the number of Pin impressions which either have spam links or have been created by users engaging in spammy activities. We periodically sample and manually review both impressed Pins and users. We scaled our measurement by starting to sample and review from highly impressed head domains and then extended the coverage to tail domains over a period of time. These samples are used for measuring overall spam prevalence as well as training our machine learning models.

Conclusion

Pinterest’s mission is to bring everyone the inspiration to create a life they love. We strive to protect our Pinners’ experiences by swiftly and appropriately acting against malicious users and spam content as identified by our array of latest machine learning models. We plan to keep investing in evolving our community guidelines and technology to address inevitably emerging challenges and bring the best experience to our millions of valued users.

Acknowledgements

Thanks to Yuanfang Song, Omkar Panhalkar, Rundong Liu, Qinglong Zeng, Attila Dobi, Abhijit Mahabal, Alok Singhal, Maisy Samuelson, and the rest of the Trust and Safety team for their contributions in developing machine learning models for spam! Thanks to Harry Shamansky for helping with the publication of the blog post!

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Engineer Manager, Content Knowledge S...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest helps people Discover and Do the things they love. We have more than 450M monthly active users who actively curate an ecosystem of more than 100B Pins on more than 1B boards, creating a rich human curated graph of immense value. 

Technically, we are building out an internet scale personalized recommendation engine in 22+ languages, which requires a deep understanding of the users and content on our platform. As engineer manager on the Content Knowledge Signal team, you’ll work on building 20+ content understanding signals based on Pinterest Knowledge Graph, which will make measurably positive impact on hundreds of millions of users with improved recommendation and featurization breakthroughs on almost all Pinterest product surfaces (Discovery, Shopping, Growth, Ads, etc). 

What you'll do:

  • Manage a horizontal team of talented and dedicated ML engineers to build the foundational content understanding and engagement features of our contents to be used across all Pinterest ecosystems
  • Utilize state of the art algorithms/industry best practice to build and improve content understanding signals 
  • Partner with other engineering teams and sales & marketing team to discover future opportunities to improve content recommendation on Pinterest
  • Hire new engineers to grow the team
  • Build ML models using text and visual information of a pin, identify the most relevant set of text annotations for that pin. These sets of highly relevant annotations are among the most important features used in more than 30 use cases within Pinterest, including key ranking models of Homefeed, Search and Ads.
  • Build ML models using text and images of the products, to understand their product categories (bags, shoes, shirts, etc) and their attributes (brand, color, style, etc). They are used to greatly improve relevance for product recommendation on major shopping surfaces. 
  • Build ML models to understand search queries, then use them, together with Pin level signals, to boost search relevance. 
  • Build graph based embedding as well as explicit annotation to represent the specialties of our native content creators, to improve creator and native content recommendation.
  • Build highly efficient and expandable data pipelines to understand engagement data at various entity levels. Such engagement signals are the major feature of the ranking models for our three main Discovery surfaces. 
  •  

What we're looking for:

  • 2+ years of industrial experience in ML team’s EM or TL for one or multiple of the following use cases with large scale: ads targeting, search and discovery, growth, content/user understanding
  • Hands-on experience working with ML algorithm development and productization.  
  • Experience working with PMs and XFN partners on E2E systems and moving business metrics

#TG1

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Software Engineer, Machine Learning P...
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

We are seeking a senior software engineer to build and boost Pinterest’s machine learning training and serving platforms and infrastructure. The candidate will work with different teams to design, build and improve our ML systems, including the model training computation platform, serving systems and model deployment systems.

What you'll do:

  • Design and build solutions to make the model training, serving and deployment process more efficient, more reliable, and less error-prone by human mistakes.
  • Design and build long term solutions to boost the model iteration velocity for machine learning engineers and data scientists.
  • Work extensively with ML engineers across Pinterest to understand their requirements, pain points, and build generalized solutions. Also work with partner teams to drive projects requiring cross-team coordination. 
  • Provide technical guidance and coaching to other junior engineers in the team.

What we're looking for:

  • Hands-on experience developing large-scale machine learning models in production, or experience working on the systems supporting onboarding large-scale machine learning models.
  • Ability to drive cross-team projects; Ability to understand our internal customers (ML practitioners), their common usage patterns and pain points.
  • Flexibility to work across different areas: tool building, model optimization, infrastructure optimization, large scale data processing pipelines, etc.
  • 5+ years of professional experience in software engineering.
  • Fluency in Python and either Java or Scala (Fluency in C++ for the MLS role).
  • Past tech lead experience is preferred, but not required. (Not necessary for the MLS role).

#LI-GB2

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Engineering Manager, Ads Engagement M...
San Francisco, CA, US; Palo Alto, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Pinterest is one of the fastest growing online ad platforms, and our success depends on mining rich user interest data that helps us connect users with highly relevant advertisers/products. We’re looking for an Engineering Manager with experience in machine learning, data mining, and information retrieval to lead a team that develops new data-driven techniques to show the most engaging and relevant promoted content to the users. You’ll be leading a world-class ML team that is growing quickly and laying the foundation for Pinterest’s business success.

What you’ll do:

  • Manage and grow the engineering team, providing technical vision and long-term roadmap
  • Design features and build large-scale machine learning models to improve ads engagement prediction
  • Effectively collaborate and partner with several cross functional teams to build the next generation of ads engagement models
  • Mentor and grow ML engineers to allow them to become experts in modeling/engagement prediction 

What we’re looking for:

  • Degree in Computer Science, Statistics or related field
  • Industry experience building production machine learning systems at scale, data mining, search, recommendations, and/or natural language processing
  • 1+ years of experience leading projects/ teams either as TL/ TLM/ EM
  • Cross-functional collaborator and strong communicator
  • Experience with ads domain is a big plus

#LI-SM4

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Engineering Manager, Ads Marketplace
San Francisco, CA, US; Palo Alto, CA, US; Seattle, WA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Within the Ads Quality team, we try to connect the dots between the aspirations of Pinners and the products offered by our partners. In this role, you will lead a team of engineers that is responsible for monetizing our Shopping and Creator surfaces. Using your strong analytical skill sets, a thorough understanding of auction mechanisms, and experience in managing an engineering team, you will advance the state of the art in Marketplace design and yield management.

What you’ll do:

  • Manage a team of engineers with a background in ML, backend development, economics, and data science to:
    • Monetize new surfaces effectively and responsibly 
    • Interface with our Product and Organic teams to understand requirements and build solutions that cater to our advertisers and users
    • Build models to enable scalable solutions for ad allocation, eligibility and pricing on the new surfaces
    • Hold a high standard for engineering excellence by building robust and future proof systems with an appreciation for simplicity and elegance
    • Identify gaps and opportunities as we expand and execute on closing those gaps effectively and in a timely manner
  • Work closely with Product on planning roadmap, set technical direction and deliver values
  • Coach and mentor team members and help them develop their career path and achieve their career goals

What we’re looking for:

  • Degree in Computer Science, Statistics, or related field
  • 2+ years of management experience
  • 5+ years of relevant experience
  • Background in computational advertising, econometrics, shopping
  • Strong industry experience in machine learning
  • Experience with ads domain is a big plus
  • Cross-functional collaborator and strong communicator

#LI-SM4

Our Commitment to Diversity:

At Pinterest, our mission is to bring everyone the inspiration to create a life they love—and that includes our employees. We’re taking on the most exciting challenges of our working lives, and we succeed with a team that represents an inclusive and diverse set of identities and backgrounds.

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like