Cloud Operations Engineer

Glean Glean · Enterprise · Engineering

Glean is looking for a Cloud Operations Engineer to ensure customer infrastructure stability and manage application support issues for their AI-powered knowledge assistant platform. This role involves owning customer environments, troubleshooting alerts, coordinating releases, setting up new projects, remediating security vulnerabilities, assisting with cloud best practices (including LLM setup), and representing customer needs for product improvements. The role requires strong technical problem-solving skills, cloud certifications (GCP, AWS, or Azure), CI/CD experience, knowledge of cloud networking and security, SQL, Kubernetes, Linux, Terraform, and coding/debugging experience in Java and Python.

What you'd actually do

  1. Own the infrastructure stability for your designated customer(s). Assist the Engineering teams in monitoring alerts and troubleshooting errors in the customer’s Glean environment.
  2. Actively engage in any customer major incident and write up, deliver, and lead the customer review of any post-incident RCA documents
  3. Coordinate and execute software releases based on agreed-upon processes & maintenance windows
  4. Set up new customer projects following Glean’s architectural design and best practices. Complete project setup in restricted environments, including running Terraform or other setup scripts manually
  5. Remediate any security vulnerabilities in your customers’ projects.

Skills

Required

  • Experience and certifications in Cloud technologies in at least one of the following: Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure
  • Experience with deployment and release using CI/CD and standard deployment frameworks in a production environment
  • Deep knowledge of cloud network and security concepts with practical implementation experience
  • Knowledge of SQL/database, Basic Kubernetes and Intermediate/Advanced Linux. Familiarity with infrastructure as code tools like Terraform is essential
  • Problem solving: technical problem-solving skills including the ability to troubleshoot and isolate issues to their root cause in cloud environments
  • Ability to debug issues including searching & reading application logs, analyzing stack traces and browser trace files
  • 2+ years of coding and debugging experience in Java and Py

Nice to have

  • Assist customers with cloud best practices with respect to the Glean implementation, including org policies, IAM setup, quotas, Disaster Recovery, and LLM setup

What the JD emphasized

  • requires additional background screenings/clearances/training/certification
  • carry & use of customer-provided equipment
  • extended on-call shift timing based on customer contractual obligations