Staff Infrastructure Reliability Engineer - Database & Storage

Redfin Redfin · Seattle · Seattle, WA +2 · Remote

Staff Infrastructure Reliability Engineer focused on database and storage systems, responsible for technical leadership, strategy development, and ensuring system reliability, maintainability, and scalability in a cloud environment. The role involves architecting and implementing cloud database and storage solutions, supporting large-scale databases, leading incident resolution, and participating in on-call rotations. A strong emphasis is placed on using and evangelizing AI code generation tools for efficiency.

What you'd actually do

  1. You will help lead the database and storage strategy at Redfin, including architecture, management, and access patterns.
  2. You will lead complex technical discussions with a variety of audiences, including software and systems engineers and business leaders.
  3. You will architect & lead implementation of cloud database and storage systems with a focus on reliability, observability, scalability, and security.
  4. You will support large scale / high volume databases both as self-managed and specialized AWS managed offerings, including management activities, such as upgrade, backup, recovery, and migration.
  5. You will use and evangelize approved AI code generation tools to document, architect, and create code.

Skills

Required

  • 7+ years of experience managing systems in AWS or a similar cloud environment
  • 5+ years of experience with PostgreSQL or similar RDBMS, AWS Aurora/RDS, AWS S3, Elasticache, Opensearch, or DynamoDB
  • Proven history in architecting, building, scaling, and supporting cloud infrastructure technologies, specializing in database and storage services
  • Extensive experience with Linux administration and Linux scripting, including Python script development
  • Experienced mentor of other engineers
  • Commitment to best practices including infrastructure as code, configuration management tooling, and security practices
  • Deep knowledge and professional use of at least one AI code generation tool
  • Excellent communication skills
  • Understanding and implementation of core reliability principles, including monitoring, alerting, and incident management

Nice to have

  • Hybrid work from Seattle, San Francisco, or Detroit offices

What the JD emphasized

  • 7+ years of experience managing systems in AWS or a similar cloud environment
  • 5+ years of experience with at least one, but preferably more, of the following: PostgreSQL or similar RDBMS; AWS Aurora/RDS; AWS S3; Elasticache; Opensearch; DynamoDB
  • Deep knowledge and professional use of at least one AI code generation tool