Site Reliability Manager, Shopping Sre

Google Google · Big Tech · Pittsburgh, PA +1

Manage a team of Software/Systems Engineers focused on Site Reliability Engineering for AI-powered systems within the Google Shopping domain. The role involves ensuring the scalability, reliability, and performance of AI-driven infrastructure, with a focus on automation and end-to-end service availability.

What you'd actually do

  1. Lead a team of Software/Systems Engineers on projects for users and be directly responsible for uptime.
  2. Own end-to-end availability and performance of key services and build automation to prevent problem recurrence. Automate response to all non-exceptional service conditions.
  3. Lead by example, mentor the team and establish credibility through quality technical execution.
  4. Design, write and deliver software to improve the availability, scalability, latency and efficiency of Google's services.
  5. Manage on-call rotations across continents, using a follow-the-sun model.

Skills

Required

  • data structures
  • algorithms
  • software development
  • managing people or teams
  • leading projects
  • designing, analyzing, and troubleshooting distributed systems
  • building and developing large-scale infrastructure or distributed systems
  • Machine Learning Infrastructure

Nice to have

  • Master’s degree in Computer Science or Engineering

What the JD emphasized

  • AI-powered systems
  • Shopping AI
  • Machine Learning Infrastructure

Other signals

  • AI-powered systems
  • Shopping AI
  • Machine Learning Infrastructure