Site Reliability Engineer III

F5 F5 · Enterprise · Hyderabad, India

Site Reliability Engineer III for the Unified Demo Framework (UDF) platform team, focusing on launching and managing F5 Guardrails and Redteam product lines. The role involves designing, deploying, and supporting Kubernetes environments for AI workloads, optimizing system performance, and ensuring reliability. Responsibilities include Kubernetes orchestration, observability, automation, and collaboration with product teams.

What you'd actually do

  1. Design, deploy, and manage Kubernetes clusters and ensure efficient container orchestration to support AI workloads.
  2. Design and implement observability pipelines for real-time monitoring of Kubernetes clusters, including metrics collection for scaling, resource utilization, and system health.
  3. Automate infrastructure management tasks to support the efficient deployment and operation of AI functionalities, including upgrades, scaling, and provisioning.
  4. Collaborate with product teams and sales engineering to integrate F5 products into the UDF platform and ensure effective utilization by the sales organization.

Skills

Required

  • Kubernetes orchestration
  • containerized architectures
  • Kubernetes clusters
  • containerized workloads
  • Kubernetes environments in AWS
  • EKS
  • monitoring and observability tools
  • CloudWatch
  • Grafana
  • Fluentd
  • DataDog
  • Infrastructure-as-Code (IaC) tools
  • Terraform
  • Helm
  • CloudFormation
  • CI/CD frameworks
  • networking
  • storage
  • compute infrastructure
  • Python
  • Go
  • Bash
  • automation
  • system integration
  • security best practices to Kubernetes environments
  • data protection
  • resource access controls
  • GPU-based workloads in Kubernetes environments
  • optimization strategies for AI based workloads
  • orchestrating complex network environments
  • troubleshooting complex network environments
  • best practices for complex network environments
  • optimizing complex network environments in AWS
  • optimizing complex network environments in GCP VPCs

Nice to have

  • Certified Kubernetes Administrator (CKA)
  • Certified Kubernetes Application Developer (CKAD)
  • AWS Certified Solutions Architect
  • GCP Cloud Architect certifications
  • advanced Kubernetes tools
  • service mesh technologies
  • Istio
  • Linkerd
  • Kubernetes operators for machine learning workflows
  • distributed computing concepts

What the JD emphasized

  • AI workloads
  • Kubernetes
  • AWS

Other signals

  • support AI workloads
  • instantiate AI features
  • optimization strategies for AI based workloads
  • Kubernetes environments