Cloud Support Engineer (sre Development)

F5 F5 · Enterprise · Field-CO

This role is for a Cloud Support Engineer with a focus on Site Reliability Engineering (SRE) within a SaaS environment. The candidate will be responsible for running, supporting, and scaling an AI Security Public SaaS platform, including operating AI inference workloads at scale. Key responsibilities include proactive monitoring, customer-centric incident response, collaboration with development teams, and contributing to building scalable infrastructure for AI inference.

What you'd actually do

  1. Monitor key SaaS application metrics, logs, and alerts to proactively identify and prevent service disruptions
  2. Serve as a primary point of contact for technical customer inquiries and issues
  3. Analyze metrics, logs, and incident reports to provide actionable insights to engineering teams
  4. Design and implement automation to streamline incident response and improve system reliability
  5. Contribute to building scalable, resilient infrastructure to support AI inference workloads

Skills

Required

  • Bachelor’s degree in Computer Science, Information Technology, or a related field (or equivalent practical experience)
  • 1–3+ years of experience in technical support, systems administration, or a similar role
  • Strong understanding of SaaS environments and cloud-based architectures (preferably AWS)
  • Proficiency in at least one scripting language (e.g., Python)
  • Solid understanding of web technologies (HTTP, REST APIs, JSON, etc.)
  • Experience working with ticketing systems
  • Strong problem-solving and analytical skills
  • Excellent written and verbal communication skills
  • Ability to work independently and collaboratively in a team environment
  • Willingness to learn new technologies and adapt to evolving requirements

Nice to have

  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana)
  • Familiarity with configuration management tools (e.g., Terraform)
  • Experience working with cloud infrastructure technologies
  • Exposure to SRE principles and reliability engineering practices
  • Strong understanding of networking fundamentals
  • Experience with databases (PostgreSQL), operating systems (Linux), and Kubernetes

What the JD emphasized

  • operating AI inference workloads at scale

Other signals

  • operating AI inference workloads at scale
  • support, and scale an AI Security Public SaaS platform
  • building scalable, resilient infrastructure to support AI inference workloads