Skytap saas enterprise-software cloud-computing
Seattle, WA

Senior Site Reliability Engineer: Observability

Skytap is the only public cloud designed specifically for the enterprise. We help businesses achieve their Cloud and DevOps migration strategies faster. Skytap uniquely enables lift-and-shift of traditional datacenter-native applications into the cloud with minimal technical changes. Once in Skytap, customers can instantly clone, share, and manage complete working application environments, enabling them to modernize software delivery cycles and application architectures. Skytap is looking for a talented Site Reliability Engineer or Software Developer to join our Observability team. The mission of the Observability team is to provide the necessary tools for visibility and insight into the health and performance of our system, and to enable teams to make informed and data-driven decisions. As a member of this team, you will work with other highly skilled and experienced engineers to design and implement software and solutions that are effective, efficient, and a pleasure to use. You will also act as a subject matter expert and assist other teams in leveling up the observability of their services and the system as a whole. You will work to support the analytics needs of teams across the company, both on the technical and business sides. You’ll champion best practices, such as driving down the signal to noise ratio of alerts. Our services and infrastructure are built primarily with open source systems and languages including Linux, Puppet, Python and Ruby, and we use our own product to power the dev and test cycle (that's right -  we run Skytap in Skytap. Inception FTW.)
  • Can take on full lifecycle ownership (development, testing, deployment, operations)
  • Has experience working with large, complex distributed systems
  • Has interest in data collection, transport, aggregation, and visualization
  • Has knowledge of and passion for machine learning and analytics
  • Has experience or interest in distributed tracing
  • Has experience designing, building, and maintaining data ingestion pipelines
  • Has software development experience using languages such as Python, Ruby, Golang, or Elixir
  • Can work with stakeholders to understand their needs and requirements and turn that into a working design
  • Can support our internal customers in leveraging the technologies and tools that we provide
  • Can communicate interesting findings to a wide audience
  • Having knowledge of and experience with kubernetes and containers a plus
  • Grafana
  • ELK stack
  • Influxdata (TICK) stack: telegraf, influxdb, kapacitor
  • Zabbix
  • MySQL
  • RabbitMQ
  • syslog-ng
  • Work with this stack