Site Reliability Engineer/ Systems EngineerApply
At Segment, we believe companies should be able to send their data wherever they want, whenever they want, with no fuss. We make this easy with a single platform that collects, stores and sends data to hundreds of business tools with the flip of a switch. Our goal is to make using data easy, and we’re looking for people to join us on the journey. We are excited about building toward a world where engineers at other companies spend their time working on their core product, rather than spending nights and weekends tweaking their customer data into various formats for 3rd party tools.
- Build software that improves the reliability, performance, and efficiency of Segment’s high-throughput, large-scale SaaS platform.
- Collaborate with the entire engineering team on projects as the expert on reliability, performance, and efficiency.
- Automate away the process of managing capacity, safely deploying software, and mitigating failures.
- Troubleshoot and mitigate the thorniest problems in our most mission-critical systems. Advise the team during postmortems on effectively avoiding repeated incidents.
- Share a 24x7 on-call rotation with the other engineers in your focus area.
- Work with cutting edge technology, share with others through open source, and spread your expertise through contributions to our engineering blog.
- CS Degree and/or a demonstrable, solid understanding of CS fundamentals.
- Proficient coder: strong with at least one programming language.
- Solid grasp of Linux systems and networking concepts
- Drive to dig into problems and burrow until the solution is found.
- Excellent communicator; writes great documentation.
- Experience operating large-scale, distributed systems on top of cloud infrastructure such as Amazon Web Services or Google Compute Platform.
- Broad understanding of the OS and of networking protocols with demonstrated ability to apply this understanding to solve real problems.
- Strong proficiency with OS tuning and expertise at the application of debugging tools.
- Strong sense of urgency and ownership over critical problem areas.
- Demonstrable experience effectively coordinating response for outages and incidents.
- Rare ability to inspire engineering teams to up their reliability game.