Senior Software Engineer, Site Reliability

Upstart · Fintech · Remote · Engineering

Upstart is an AI lending marketplace that uses advanced AI and machine learning to reshape access to credit. The Site Reliability Engineering (SRE) team owns the reliability, resiliency, and observability of Upstart's production systems, building tooling and automation to monitor health, improve incident response, and automate toil. This role focuses on Site Reliability Tooling, impacting the SRE team and all of Upstart by shaping the future path of SRE at Upstart.

What you'd actually do

  1. Embody and share SRE principles at Upstart
  2. Exercise state-of-the-art SRE practices throughout the company
  3. Uphold a culture of visibility, ownership, and responsibility around service reliability
  4. Implement standards for monitoring microservices, web apps, mobile apps, databases, Kubernetes clusters, and machine learning platforms, in a fast-paced environment
  5. Improve incident response practices, both within SRE and throughout the company
  6. Automate away toil that make sense to be automated

Skills

Required

  • SRE principles
  • monitoring microservices
  • web apps
  • mobile apps
  • databases
  • Kubernetes clusters
  • machine learning platforms
  • incident response
  • automation

What the JD emphasized

  • machine learning platforms