Senior Site Reliability Engineer

Redfin Redfin · Seattle · Detroit, MI

This role focuses on designing, building, and operating scalable cloud infrastructure and AI-driven tooling to improve system reliability and operational efficiency. The engineer will partner with infrastructure, observability, and incident management teams to identify opportunities for AI and intelligent automation in operational workflows, troubleshooting, and reliability practices. Key responsibilities include managing cloud platforms, Kubernetes, and developing automation for infrastructure and operations, with a familiarity with AI concepts for operational automation.

What you'd actually do

  1. Design, build, and operate scalable cloud infrastructure supporting Linux and Windows workloads
  2. Develop and implement AI-powered automation and tooling to improve system reliability and operational efficiency
  3. Troubleshoot cloud and distributed systems issues across AWS or Onprem environments
  4. Manage and optimize containerized environments using Kubernetes
  5. Support and maintain identity and access management corporate infrastructure

Skills

Required

  • 5+ years of experience in infrastructure engineering, systems administration, or platform engineering
  • 3+ years of experience in systems engineering or reliability engineering
  • Experience operating cloud platforms (AWS, Azure, or GCP)
  • Strong experience with container orchestration platforms such as Kubernetes
  • Proficiency with scripting and programming languages (Python, .NET, JavaScript)
  • Experience building automation tooling for infrastructure or operations
  • Experience with observability, monitoring, and incident response tooling
  • Strong troubleshooting skills across distributed systems and cloud infrastructure

Nice to have

  • Familiarity with AI tooling or AI concepts for operational automation

What the JD emphasized

  • AI-driven tooling and automation
  • AI-powered automation and tooling
  • AI-assisted automation
  • Familiarity with AI tooling or AI concepts for operational automation

Other signals

  • AI-driven tooling and automation
  • improves system reliability
  • engineering productivity
  • streamline operational workflows
  • improve troubleshooting
  • enhance reliability practices
  • AI-powered automation and tooling
  • AI-assisted automation