How Pinterest Fights Spam Using Machine Learning

563
Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.

By Vishwakarma Singh | Trust and Safety Machine Learning Lead


Hundreds of millions of people regularly visit Pinterest to visually discover inspiring ideas among billions of Pins. Inspiration is a high bar and we must be vigilant in ensuring that Pinners don’t see spam, harmful content or misinformation. To enforce our community policies and maintain an inspiring environment, we use the latest in machine learning technology to build automated systems that swiftly detect and act against both spammy content and spammers.

Our anti-spam system consists of both reactive and proactive components to effectively counter adversarial abusers — users who intentionally try to evade the system. Our proactive system consists of sophisticated machine learning models, whereas the reactive system includes both rules executed in a real-time rules engine and lightweight machine learning models. We not only use the latest modeling techniques but also iterate on these models at regular intervals by adding new data and exploring new technical breakthroughs to either maintain or improve their performance over time to effectively address spam.

One tactic malicious actors enact is misusing a Pin’s image and linking to a malicious external website. Our models detect spam vectors, like Pin links, as well as users engaging in spammy behaviors. We quickly limit distribution of Pins with spam links and take direct action against users identified with a high confidence to be engaging in spammy behavior. We perform a manual review for those identified with low confidence to limit false positives, and we notify users of our actions to maintain transparency and also provide an option of appeal against our decision.

Machine Learning Models

Spam Domain Model

We proactively identify spam Pin links using a Deep Neural Network classifier (shown in Figure 1). To maximize impact, our model learns to classify a domain as spam rather than a link. We apply the same enforcement to all Pins with links belonging to the same domain. This model is trained interactively on manually labeled domains to achieve a higher recall and lower false positive rate. We use features created from links, web page text and media, user-domain interactions, and user behavior as inputs. For each domain, we sample links and webpages to create features. We semantically split links into semantic tokens and use only frequent tokens as features. We analyze outlying patterns in user actions over time to create behavioral features. This model is periodically batch inferred at scale by a PySpark job using Tensorflow, Spark SQL, and a UDF.

Figure 1. Deep Neural Network for domain classification

Spam User Model

Identifying users engaging in spam activities is the ultimate solution for fighting spam, but it is extremely hard to achieve. We leverage both supervised and unsupervised models to build an effective spam user identification system.

Classification Model

Our spam user classification model is a Deep Neural Network (shown in Figure 2) and is part of our proactive system. It is trained using synthetically labeled data generated with minimal human supervision to ensure quality. We use features created from user attributes and their past behaviors as inputs. We also use user-domain interaction, summarized as a domain scores distribution for each user where domain scores are reused from the spam domain model, as an input. This model is periodically batch inferred to score millions of Pinners by a PySpark job using Tensorflow, Spark SQL, and a UDF.

Figure 2. Deep Neural Network for user classification

Clustering

We have developed lightweight clustering models for early detection of suspicious users and bots. This technique also addresses gaps in our classification models, which are unaware of emerging patterns unless re-trained with fresh labeled data. We cluster users on attributes which can successfully isolate suspicious groups with high accuracy. Experts identify these attributes by exploring the behavior of suspicious users and their use of resources for creating spammy content. This model is implemented using PySpark and SparkSQL and executes daily.

Spam User-Domain Model

Interactions of users with domains are explicitly captured by a heterogeneous bipartite graph as shown in Figure 3. We represent users and domains as nodes in the graph and create an edge between a user and a domain if the user has created or saved a Pin with the domain’s link. This graph facilitates simultaneous identification of spam users and domains using a semi-supervised learning. We use a small set of labeled users and domains to run a label propagation algorithm and learn scores for the unlabeled users and domains. We implement this iterative algorithm in Spark and run it periodically.

Figure 3. Bipartite graph of users and domains for label propagation

Measurement

We measure spam prevalence on Pinterest by computing the number of Pin impressions which either have spam links or have been created by users engaging in spammy activities. We periodically sample and manually review both impressed Pins and users. We scaled our measurement by starting to sample and review from highly impressed head domains and then extended the coverage to tail domains over a period of time. These samples are used for measuring overall spam prevalence as well as training our machine learning models.

Conclusion

Pinterest’s mission is to bring everyone the inspiration to create a life they love. We strive to protect our Pinners’ experiences by swiftly and appropriately acting against malicious users and spam content as identified by our array of latest machine learning models. We plan to keep investing in evolving our community guidelines and technology to address inevitably emerging challenges and bring the best experience to our millions of valued users.

Acknowledgements

Thanks to Yuanfang Song, Omkar Panhalkar, Rundong Liu, Qinglong Zeng, Attila Dobi, Abhijit Mahabal, Alok Singhal, Maisy Samuelson, and the rest of the Trust and Safety team for their contributions in developing machine learning models for spam! Thanks to Harry Shamansky for helping with the publication of the blog post!

Pinterest
Pinterest is a social bookmarking site where users collect and share photos of their favorite events, interests and hobbies. One of the fastest growing social networks online, Pinterest is the third-largest such network behind only Facebook and Twitter.
Tools mentioned in article
Open jobs at Pinterest
Software Engineer, Data Privacy
Dublin, IE

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Data Privacy Engineering team builds platforms and works with engineers across Pinterest to help ensure our handling of customer and partner data meets or exceeds their expectations of privacy and security.  You’ll design and code solutions across the breadth of Pinterest’s stack to control how restricted data is consumed, and provide teams with the tools to manage the use of their data.  Your initial focus will be handling authorization at a massive scale in our big-data systems and helping Pinterest engineers utilize the platform and manage authorization.  The Data Privacy team is part of a growing group of engineers based in the Pinterest Dublin office, partnering closely with other Data Engineering and Security teams in our US offices.

What you’ll do:

  • Code in all our big-data and machine-learning services to control access to restricted data
  • Build tools and processes to help Pinterest engineers understand how their data is consumed and ensure data is handled correctly
  • Consult with engineers, product designers, and security experts to design data-handling solutions
  • Build solutions that log and monitor data usage

What we’re looking for:

  • Experience building enterprise-scale backend services in an object-oriented programing language (Java preferred)
  • Understanding of big-data processing concepts
  • Experience with basic data querying and analytics techniques
  • Strong advocacy for the customer and their privacy

#LI-KL1

Video Platform Engineer
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

Video is becoming the most important content format on Pinterest ecosystem. This role will act as an architect for Pinterest video platform, which responsible for the whole lifecycle of a video from uploading, transcoding, delivery and playback. The video architect will oversee Pinterest video platform strategy, owns the direction of what will be our next strategic investment to strengthen our video platform, and land the strategy into major initiatives towards the directions.

What you'll do: 

  • Lead the optimization and improvement in video codec efficiency, encoder rate control, transcode speed, video pre/post-processing and error resilience.
  • Improve end-to-end video experiences on lossy networks in various user scenarios.
  • Identify various opportunities to optimize in video codec, pipeline, error resilience.
  • Define the video optimization roadmap for both low-end and high-end network and devices.
  • Lead the definition and implementation of media processing pipeline.

What we're looking for: 

  • Experience with AWS Elemental
  • Solid knowledge in modern video codecs such as H.264, H.265, VP8/VP9 and AV1. 
  • Deep understanding of adaptive streaming technology especially HLS and MPEG-DASH.
  • Experience in architecting end to end video streaming infrastructure.
  • Experience in building media upload and transcoding pipelines.
  • Proficient in FFmpeg command line tools and libraries.
  • Familiar with popular client side media frameworks such as AVFoundation, Exoplayer, HLS.js, and etc.
  • Experience with streaming quality optimization on mobile devices.
  • Experience collaborating cross-functionally between groups with different video technologies and pipelines.

#LI-EA1

Senior Software Engineer, Data Privacy
Dublin, IE

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Data Privacy Engineering team builds platforms and works with engineers across Pinterest to help ensure our handling of customer and partner data meets or exceeds their expectations of privacy and security.  We’re a small, and growing, team based in Dublin.  We own three major engineering projects with company-wide impact: expanding and onboarding teams doing big data processing to a new fine-grained data access platform, tracking how data moves and evolves through our systems, and ensuring data is always handled appropriately.  As a Senior Engineer, you’ll take a driving role on one of these projects and responsibility for working with internal teams to understand their needs, designing solutions, and collaborating with teams in Dublin and the US to successfully execute on your plans.  Your work will help ensure the safety of our users’ and partners’ data and help Pinterest be a source of inspiration for millions of users.

What you’ll do:

  • Consult with engineers, product designers, and security experts to design data-handling solutions
  • Review code and designs from across the company to guide teams to secure and private solutions
  • Onboard customers onto platforms and refine our tools to streamline these processes
  • Mentor and coach engineers and grow your technical leadership skills, with engineers in Dublin and other offices.
  • Grow your engineering skills as you work with a range of open-source technologies and engineers across the company, and code across Pinterest’s stack in a variety of languages

What we’re looking for:

  • 5+ years of experience building enterprise-scale backend services in an object-oriented programing language (Java preferred)
  • Experience mentoring junior engineers and driving an engineering culture
  • The ability to drive ambiguous projects to successful outcomes independently
  • Understanding of big-data processing concepts
  • Experience with data querying and analytics techniques
  • Strong advocacy for the customer and their privacy

#LI-KL1

Engineering Manager, Social Product
San Francisco, CA, US

About Pinterest:  

Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love. In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping Pinners make their lives better in the positive corner of the internet.

The Core Engineering team is focused on ensuring the 450 million people who use Pinterest every month have an amazing journey to inspiration - no matter where or how they engage. We do this by working to directly improve and innovate on Pinterest’s core features, and developing new features to improve the product and user experience. We’re looking for an Engineering Manager to join our Social Product team. You will lead a full-stack team of engineers to build features that help Pinners connect with each other - working directly at the intersection of our most exciting initiatives of core product and creator platform. 

What you'll do:

  • Partner closely with Design, Product Management, and other Engineering leaders within Core Product and across Pinterest to identify the highest-impact opportunities to accelerate the product and user experience
  • Take ownership of critical outcomes. Manage a cross-platform (Android/iOS/Web/Backend) engineering team through the process of planning and executing against those outcomes, delivering software experiences and solutions of the highest quality
  • Grow the team's capabilities through career development of the current engineers and strategic hiring

What we're looking for:

  • 5+ years of product engineering experience, with some management experience
  • Passion for developing software solutions for the people using Pinterest
  • Ability to deliver on immediate goals and form long-term strategies around technology, processes, and people
  • Strong people development skills and software project management skills
  • Proven success launching complex software development projects that deliver high-impact outcomes

#LI-PB1219

Verified by
Security Software Engineer
Tech Lead, Big Data Platform
Software Engineer
Talent Brand Manager
Sourcer
Software Engineer
You may also like