Pinterest Flink Deployment Framework

2,190
Pinterest
Pinterest's profile on StackShare is not actively maintained, so the information here may be out of date.

By Rainie Li | Software Engineer, Stream Processing Platform Team


Background

At Pinterest, stream processing allows us to unlock value from real time data for pinners and partners. The Stream Processing Platform team is working on building a reliable and scalable platform to support many critical streaming applications including real-time experiment analytics and real time machine learning signals.

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. It provides features including exactly-once guarantees, low latency, high throughput, and powerful computation model. At Pinterest, we adopt Flink as the unified streaming processing engine.

Requirements

Standardize Flink Build

At Pinterest, we use Bazel as a build system. We need a standardized Bazel rule to build all Flink jobs without changing Makefiles. Once build is done, instead of asking users to copy Flink jars to YARN clusters, jars should be automatically uploaded to remote storage.

Deployment and Operations History

Users used to copy Flink jars to YARN clusters and manually run commands. It was hard to track previous execution histories if we needed to recover failed jobs. We need to provide standard Flink operations such as launching, killing, triggering savepoint, and resuming jobs from the most recent savepoint.

Job Deduplication

Flink applications are deployed as services, therefore one instance should be running at a time for each Flink application. We need to prevent cases when users accidentally deploy twice for the same job, meaning both instances might write to the same Kafka topic. This would mean double writes to Kafka and could affect downstream jobs.

Deployment Framework

We built our Flink deployment framework on top of Bazel, Hermez (internal continuous deployment platform), Job Submission Service (internal service), and YARN clusters.

Figure 1. Deployment high level architecture

Create Bazel BUILD file

The BUILD file needs to contain load(“flink_release”). Users also need to insert a Bazel rule like this:

Define Hermez Deployment File

Hermez is the Pinterest Continuous Deployment System. In order to launch a Flink job with Hermez, users need to create a Hermez.yml file. This file contains information including which YARN cluster Flink jobs to run in, what YARN parameters to use, what resources to use, etc. For each instance of Flink job, users should set up a separate YAML file. For example, if users run their jobs in dev, staging, and prod environments, they will need to have three different YAML files (one for each environment).

Here’s an example of yml file:

Automatically Flink Job Building

The following numbers are referring to steps in Figure 1: Deployment high level architecture

Whenever a user lands a change to Git repo, Jenkins job will be triggered to build Flink job JARs (1). Jenkins job will follow flink_relase rules that are described in the BUILD file to build Flink JAR and upload it to the S3 bucket (3). Meanwhile, it will upload deployment related Hermez YAML files to Artifactory (2). Hermez monitors Artifactory; when it sees a new yml file, it will display it on UI to allow users to launch a job using that yml (5).

Flink Job Launching

When users launch a Flink job, Hermez converts the yml file into a JSON and submits it to Job Submission Service (JSS) (6). JSS is a service maintained by Pinterest that has the ability to schedule and launch Flink jobs to YARN clusters.

JSS examines the request and ensures that Flink JARs and Flink job state exist in S3 (7). If everything is alright, JSS will first launch a shell-runner job which will execute a command on a YARN cluster cluster (8). The shell-runner job downloads the Flink job’s JAR from S3 and then kicks off the actual Flink job using the configuration provided by JSS (9). The reason we add a shell-runner job is to keep JSS as a thin layer without dealing with different compute engine clients (Flink, Spark, MapReduce, etc.) and different configurations for each cluster.

JSS Deduplication

When resuming a Flink job, we provide several options including resume from most recent savepoint or checkpoint, fresh state, and specify a savepoint or checkpoint path. Job deduplication features ensure that there is only one instance of your Flink job running at a time.

The way job deduplication works is that each job has a unique name when a job is submitted. If there is already an instance of the job running, JSS will trigger a safepoint and stop it first, then submit the new job. If the stop request fails because savepoint fails, then the submitted request will fail and the running instance remains running. If there is one deployment in progress, the new job submission would be rejected

Flink Job Configuration Hotfix

Due to Flink configuration being packaged together with Flink job binary, users used to check in config changes to Repo and rebuild the package. This whole process could take more than 10 minutes. This can be a problem if we would like to quickly adjust parameters during incidents. For example, when Flink jobs failed in production due to lack of resources, we used to go through the entire build process to rollout resource config changes. After the incidents got resolved, we needed to check in another change to roll back these configs. To speed up this process, we provide a hotfix feature on Hermez to overwrite Flink job configuration without code change. Users can adjust Flink configuration values during deployment. Behind the scenes, Hermez will directly overwrite these values in ymls which Hermez read from Artifactory.

What’s Next

Reducing Deployment Latency

The current approach launches shell-runner first. Then, shell-runner launches Flink jobs to YARN clusters which could increase latency. We plan to improve this process to reduce end-to-end Flink job launch time.

Automatically Job Failover

To further improve platform and Flink application availability, we built YARN clusters in multiple AWS Availability Zones (AZ) to provide backup when one cluster or one AZ become unavailable. We are also building a service that could automatically detect any cluster failure and failover failed jobs to backup clusters in different AZs or detect application failures and restart the application automatically.

Stay tuned!

Acknowledgments

Thanks to Steven Bairos-Novak and Yu Yang for their countless contributions. Thanks Ang Zhang for updating this blog. This project is a joint effort across multiple teams at Pinterest. Thanks to the Engineering Productivity Team for Hermez support.

Pinterest
Pinterest's profile on StackShare is not actively maintained, so the information here may be out of date.
Tools mentioned in article
Open jobs at Pinterest
Sr. Staff Software Engineer, Ads ML I...
San Francisco, CA, US; , CA, US
<div class="content-intro"><p><strong>About Pinterest</strong><span style="font-weight: 400;">:&nbsp;&nbsp;</span></p> <p>Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love.&nbsp;In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping&nbsp;Pinners&nbsp;make their lives better in the positive corner of the internet.</p> <p>Creating a life you love also means finding a career that celebrates the unique perspectives and experiences that you bring. As you read through the expectations of the position, consider how your skills and experiences may complement the responsibilities of the role. We encourage you to think through your relevant and transferable skills from prior experiences.</p> <p><em>Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our </em><a href="https://www.pinterestcareers.com/pinflex/" target="_blank"><em><u>PinFlex</u></em></a><em> landing page to learn more.&nbsp;</em></p></div><p>Pinterest is one of the fastest growing online advertising platforms. Continued success depends on the machine-learning systems, which crunch thousands of signals in a few hundred milliseconds, to identify the most relevant ads to show to pinners. You’ll join a talented team with high impact, which designs high-performance and efficient ML systems, in order to power the most critical and revenue-generating models at Pinterest.</p> <p><strong>What you’ll do</strong></p> <ul> <li>Being the technical leader of the Ads ML foundation evolution movement to 2x Pinterest revenue and 5x ad performance in next 3 years.</li> <li>Opportunities to use cutting edge ML technologies including GPU and LLMs to empower 100x bigger models in next 3 years.&nbsp;</li> <li>Tons of ambiguous problems and you will be tasked with building 0 to 1 solutions for all of them.</li> </ul> <p><strong>What we’re looking for:</strong></p> <ul> <li>BS (or higher) degree in Computer Science, or a related field.</li> <li>10+ years of relevant industry experience in leading the design of large scale &amp; production ML infra systems.</li> <li>Deep knowledge with at least one state-of-art programming language (Java, C++, Python).&nbsp;</li> <li>Deep knowledge with building distributed systems or recommendation infrastructure</li> <li>Hands-on experience with at least one modeling framework (Pytorch or Tensorflow).&nbsp;</li> <li>Hands-on experience with model / hardware accelerator libraries (Cuda, Quantization)</li> <li>Strong communicator and collaborative team player.</li> </ul><div class="content-pay-transparency"><div class="pay-input"><div class="description"><p>At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.</p> <p><em><span style="font-weight: 400;">Information regarding the culture at Pinterest and benefits available for this position can be found <a href="https://www.pinterestcareers.com/pinterest-life/" target="_blank">here</a>.</span></em></p></div><div class="title">US based applicants only</div><div class="pay-range"><span>$135,150</span><span class="divider">&mdash;</span><span>$278,000 USD</span></div></div></div><div class="content-conclusion"><p><strong>Our Commitment to Diversity:</strong></p> <p>Pinterest is an equal opportunity employer and makes employment decisions on the basis of merit. We want to have the best qualified people in every job. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you require an accommodation during the job application process, please notify&nbsp;<a href="mailto:accessibility@pinterest.com">accessibility@pinterest.com</a>&nbsp;for support.</p></div>
Senior Staff Machine Learning Enginee...
San Francisco, CA, US; , CA, US
<div class="content-intro"><p><strong>About Pinterest</strong><span style="font-weight: 400;">:&nbsp;&nbsp;</span></p> <p>Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love.&nbsp;In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping&nbsp;Pinners&nbsp;make their lives better in the positive corner of the internet.</p> <p>Creating a life you love also means finding a career that celebrates the unique perspectives and experiences that you bring. As you read through the expectations of the position, consider how your skills and experiences may complement the responsibilities of the role. We encourage you to think through your relevant and transferable skills from prior experiences.</p> <p><em>Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our </em><a href="https://www.pinterestcareers.com/pinflex/" target="_blank"><em><u>PinFlex</u></em></a><em> landing page to learn more.&nbsp;</em></p></div><p>We are looking for a highly motivated and experienced Machine Learning Engineer to join our team and help us shape the future of machine learning at Pinterest. In this role, you will tackle new challenges in machine learning that will have a real impact on the way people discover and interact with the world around them.&nbsp; You will collaborate with a world-class team of research scientists and engineers to develop new machine learning algorithms, systems, and applications that will bring step-function impact to the business metrics (recent publications <a href="https://arxiv.org/abs/2205.04507">1</a>, <a href="https://dl.acm.org/doi/abs/10.1145/3523227.3547394">2</a>, <a href="https://arxiv.org/abs/2306.00248">3</a>).&nbsp; You will also have the opportunity to work on a variety of exciting projects in the following areas:&nbsp;</p> <ul> <li>representation learning</li> <li>recommender systems</li> <li>graph neural network</li> <li>natural language processing (NLP)</li> <li>inclusive AI</li> <li>reinforcement learning</li> <li>user modeling</li> </ul> <p>You will also have the opportunity to mentor junior researchers and collaborate with external researchers on cutting-edge projects.&nbsp;&nbsp;</p> <p><strong>What you'll do:&nbsp;</strong></p> <ul> <li>Lead cutting-edge research in machine learning and collaborate with other engineering teams to adopt the innovations into Pinterest problems</li> <li>Collect, analyze, and synthesize findings from data and build intelligent data-driven model</li> <li>Scope and independently solve moderately complex problems; write clean, efficient, and sustainable code</li> <li>Use machine learning, natural language processing, and graph analysis to solve modeling and ranking problems across growth, discovery, ads and search</li> </ul> <p><strong>What we're looking for:</strong></p> <ul> <li>Mastery of at least one systems languages (Java, C++, Python) or one ML framework (Pytorch, Tensorflow, MLFlow)</li> <li>Experience in research and in solving analytical problems</li> <li>Strong communicator and team player. Being able to find solutions for open-ended problems</li> <li>8+ years working experience in the r&amp;d or engineering teams that build large-scale ML-driven projects</li> <li>3+ years experience leading cross-team engineering efforts that improves user experience in products</li> <li>MS/PhD in Computer Science, ML, NLP, Statistics, Information Sciences or related field</li> </ul> <p><strong>Desired skills:</strong></p> <ul> <li>Strong publication track record and industry experience in shipping machine learning solutions for large-scale challenges&nbsp;</li> <li>Cross-functional collaborator and strong communicator</li> <li>Comfortable solving ambiguous problems and adapting to a dynamic environment</li> </ul> <p>This position is not eligible for relocation assistance.</p> <p>#LI-SA1</p> <p>#LI-REMOTE</p><div class="content-pay-transparency"><div class="pay-input"><div class="description"><p>At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.</p> <p><em><span style="font-weight: 400;">Information regarding the culture at Pinterest and benefits available for this position can be found <a href="https://www.pinterestcareers.com/pinterest-life/" target="_blank">here</a>.</span></em></p></div><div class="title">US based applicants only</div><div class="pay-range"><span>$158,950</span><span class="divider">&mdash;</span><span>$327,000 USD</span></div></div></div><div class="content-conclusion"><p><strong>Our Commitment to Diversity:</strong></p> <p>Pinterest is an equal opportunity employer and makes employment decisions on the basis of merit. We want to have the best qualified people in every job. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you require an accommodation during the job application process, please notify&nbsp;<a href="mailto:accessibility@pinterest.com">accessibility@pinterest.com</a>&nbsp;for support.</p></div>
Staff Software Engineer, ML Training
San Francisco, CA, US; , CA, US
<div class="content-intro"><p><strong>About Pinterest</strong><span style="font-weight: 400;">:&nbsp;&nbsp;</span></p> <p>Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love.&nbsp;In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping&nbsp;Pinners&nbsp;make their lives better in the positive corner of the internet.</p> <p>Creating a life you love also means finding a career that celebrates the unique perspectives and experiences that you bring. As you read through the expectations of the position, consider how your skills and experiences may complement the responsibilities of the role. We encourage you to think through your relevant and transferable skills from prior experiences.</p> <p><em>Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our </em><a href="https://www.pinterestcareers.com/pinflex/" target="_blank"><em><u>PinFlex</u></em></a><em> landing page to learn more.&nbsp;</em></p></div><p>The ML Platform team provides foundational tools and infrastructure used by hundreds of ML engineers across Pinterest, including recommendations, ads, visual search, growth/notifications, trust and safety. We aim to ensure that ML systems are healthy (production-grade quality) and fast (for modelers to iterate upon).</p> <p>We are seeking a highly skilled and experienced Staff Software Engineer to join our ML Training Infrastructure team and lead the technical strategy. The ML Training Infrastructure team builds platforms and tools for large-scale training and inference, model lifecycle management, and deployment of models across Pinterest. ML workloads are increasingly large, complex, interdependent and the efficient use of ML accelerators is critical to our success. We work on various efforts related to adoption, efficiency, performance, algorithms, UX and core infrastructure to enable the scheduling of ML workloads.</p> <p>You’ll be part of the ML Platform team in Data Engineering, which aims to ensure healthy and fast ML in all of the 40+ ML use cases across Pinterest.</p> <p><strong>What you’ll do:</strong></p> <ul> <li>Implement cost effective and scalable solutions to allow ML engineers to scale their ML training and inference workloads on compute platforms like Kubernetes.</li> <li>Lead and contribute to key projects; rolling out GPU sharing via MIGs and MPS , intelligent resource management, capacity planning, fault tolerant training.</li> <li>Lead the technical strategy and set the multi-year roadmap for ML Training Infrastructure that includes ML Compute and ML Developer frameworks like PyTorch, Ray and Jupyter.</li> <li>Collaborate with internal clients, ML engineers, and data scientists to address their concerns regarding ML development velocity and enable the successful implementation of customer use cases.</li> <li>Forge strong partnerships with tech leaders in the Data and Infra organizations to develop a comprehensive technical roadmap that spans across multiple teams.</li> <li>Mentor engineers within the team and demonstrate technical leadership.</li> </ul> <p><strong>What we’re looking for:</strong></p> <ul> <li>7+ years of experience in software engineering and machine learning, with a focus on building and maintaining ML infrastructure or Batch Compute infrastructure like YARN/Kubernetes/Mesos.</li> <li>Technical leadership experience, devising multi-quarter technical strategies and driving them to success.</li> <li>Strong understanding of High Performance Computing and/or and parallel computing.</li> <li>Ability to drive cross-team projects; Ability to understand our internal customers (ML practitioners and Data Scientists), their common usage patterns and pain points.</li> <li>Strong experience in Python and/or experience with other programming languages such as C++ and Java.</li> <li>Experience with GPU programming, containerization, orchestration technologies is a plus.</li> <li>Bonus point for experience working with cloud data processing technologies (Apache Spark, Ray, Dask, Flink, etc.) and ML frameworks such as PyTorch.</li> </ul> <p>This position is not eligible for relocation assistance.</p> <p>#LI-REMOTE</p> <p><span data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;#LI-AH2&quot;}" data-sheets-userformat="{&quot;2&quot;:14464,&quot;10&quot;:2,&quot;14&quot;:{&quot;1&quot;:2,&quot;2&quot;:0},&quot;15&quot;:&quot;Helvetica Neue&quot;,&quot;16&quot;:12}">#LI-AH2</span></p><div class="content-pay-transparency"><div class="pay-input"><div class="description"><p>At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.</p> <p><em><span style="font-weight: 400;">Information regarding the culture at Pinterest and benefits available for this position can be found <a href="https://www.pinterestcareers.com/pinterest-life/" target="_blank">here</a>.</span></em></p></div><div class="title">US based applicants only</div><div class="pay-range"><span>$135,150</span><span class="divider">&mdash;</span><span>$278,000 USD</span></div></div></div><div class="content-conclusion"><p><strong>Our Commitment to Diversity:</strong></p> <p>Pinterest is an equal opportunity employer and makes employment decisions on the basis of merit. We want to have the best qualified people in every job. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you require an accommodation during the job application process, please notify&nbsp;<a href="mailto:accessibility@pinterest.com">accessibility@pinterest.com</a>&nbsp;for support.</p></div>
Distinguished Engineer, Frontend
San Francisco, CA, US; , US
<div class="content-intro"><p><strong>About Pinterest</strong><span style="font-weight: 400;">:&nbsp;&nbsp;</span></p> <p>Millions of people across the world come to Pinterest to find new ideas every day. It’s where they get inspiration, dream about new possibilities and plan for what matters most. Our mission is to help those people find their inspiration and create a life they love.&nbsp;In your role, you’ll be challenged to take on work that upholds this mission and pushes Pinterest forward. You’ll grow as a person and leader in your field, all the while helping&nbsp;Pinners&nbsp;make their lives better in the positive corner of the internet.</p> <p>Creating a life you love also means finding a career that celebrates the unique perspectives and experiences that you bring. As you read through the expectations of the position, consider how your skills and experiences may complement the responsibilities of the role. We encourage you to think through your relevant and transferable skills from prior experiences.</p> <p><em>Our new progressive work model is called PinFlex, a term that’s uniquely Pinterest to describe our flexible approach to living and working. Visit our </em><a href="https://www.pinterestcareers.com/pinflex/" target="_blank"><em><u>PinFlex</u></em></a><em> landing page to learn more.&nbsp;</em></p></div><p>As a Distinguished Engineer at Pinterest, you will play a pivotal role in shaping the technical direction of our platform, driving innovation, and providing leadership to our engineering teams. You'll be at the forefront of developing cutting-edge solutions that impact millions of users.</p> <p><strong>What you’ll do:</strong></p> <ul> <li>Advise executive leadership on highly complex, multi-faceted aspects of the business, with technological and cross-organizational impact.</li> <li>Serve as a technical mentor and role model for engineering teams, fostering a culture of excellence.</li> <li>Develop cutting-edge innovations with global impact on the business and anticipate future technological opportunities.</li> <li>Serve as strategist to translate ideas and innovations into outcomes, influencing and driving objectives across Pinterest.</li> <li>Embed systems and processes that develop and connect teams across Pinterest to harness the diversity of thought, experience, and backgrounds of Pinployees.</li> <li>Integrate velocity within Pinterest; mobilizing the organization by removing obstacles and enabling teams to focus on achieving results for the most important initiatives.</li> </ul> <p>&nbsp;<strong>What we’re looking for:</strong>:</p> <ul> <li>Proven experience as a distinguished engineer, fellow, or similar role in a technology company.</li> <li>Recognized as a pioneer and renowned technical authority within the industry, often globally, requiring comprehensive expertise in leading-edge theories and technologies.</li> <li>Deep technical expertise and thought leadership that helps accelerate adoption of the very best engineering practices, while maintaining knowledge on industry innovations, trends and practices.</li> <li>Ability to effectively communicate with and influence key stakeholders across the company, at all levels of the organization.</li> <li>Experience partnering with cross-functional project teams on initiatives with significant global impact.</li> <li>Outstanding problem-solving and analytical skills.</li> </ul> <p>&nbsp;</p> <p>This position is not eligible for relocation assistance.</p> <p>&nbsp;</p> <p>#LI-REMOTE</p> <p>#LI-NB1</p><div class="content-pay-transparency"><div class="pay-input"><div class="description"><p>At Pinterest we believe the workplace should be equitable, inclusive, and inspiring for every employee. In an effort to provide greater transparency, we are sharing the base salary range for this position. The position is also eligible for equity. Final salary is based on a number of factors including location, travel, relevant prior experience, or particular skills and expertise.</p> <p><em><span style="font-weight: 400;">Information regarding the culture at Pinterest and benefits available for this position can be found <a href="https://www.pinterestcareers.com/pinterest-life/" target="_blank">here</a>.</span></em></p></div><div class="title">US based applicants only</div><div class="pay-range"><span>$242,029</span><span class="divider">&mdash;</span><span>$498,321 USD</span></div></div></div><div class="content-conclusion"><p><strong>Our Commitment to Diversity:</strong></p> <p>Pinterest is an equal opportunity employer and makes employment decisions on the basis of merit. We want to have the best qualified people in every job. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic under federal, state, or local law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you require an accommodation during the job application process, please notify&nbsp;<a href="mailto:accessibility@pinterest.com">accessibility@pinterest.com</a>&nbsp;for support.</p></div>
You may also like