Lead Site Reliability Engineer

Glean Glean · Enterprise · Mountain View, CA · Engineering

Lead Site Reliability Engineer for Glean, an AI platform company. This role focuses on ensuring the reliability, availability, and performance of Glean's services and cloud infrastructure. Responsibilities include technical leadership, incident management, automation, performance optimization, and security/compliance collaboration. The role involves managing a team and contributing to the development of scalable cloud operations and SRE practices within a hybrid cloud environment.

What you'd actually do

  1. Foster a culture of engineering excellence, drive technical strategy, and develop a high-performing, collaborative team.
  2. Ensure our services meet stringent Service Level Objectives (SLOs) and in building resilient, automated production environments in the cloud.
  3. Lead a team and be responsible for products globally, providing technical leadership to key projects and empowering your team to do the same.
  4. Manage the complex challenges of scale and fast growth which are unique to Glean, while using your expertise in coding, algorithms, problem-solving, and SRE practices.
  5. Keep Glean applications up and running, ensuring our customers have the best and most reliable experience possible.

Skills

Required

  • Site Reliability Engineering
  • cloud-based services and infrastructure management
  • software development in one or more programming languages
  • managing people or teams
  • leading projects
  • designing, analyzing, and troubleshooting distributed systems running in Cloud
  • Google Cloud Platform, AWS, or Azure
  • Docker
  • Kubernetes
  • Terraform
  • networking
  • security principles
  • SRE practices
  • security practices

What the JD emphasized

  • stringent Service Level Objectives (SLOs)
  • resilient, automated production environments
  • scale and fast growth
  • coding, algorithms, problem-solving, and SRE practices
  • best and most reliable experience possible