Sr Site Reliability Engineer (us Federal)

Workday Workday · Enterprise · USA.VA.Reston

This role is for a Sr. Site Reliability Engineer supporting Workday's Federal Platform Engineering team. The engineer will be responsible for operating, monitoring, automating, maintaining, and providing metrics and observability for a Kubernetes-based platform. The role involves infrastructure automation, CI/CD pipelines, incident handling, and ensuring high availability, scalability, and security of core platform components. While the company mentions AI and agents, the core function of this role is infrastructure and platform reliability engineering, not direct AI/ML model development or deployment.

What you'd actually do

  1. Ensuring the Workday Kubernetes based platform is maintained, healthy, and ensures high availability for our customers through, infrastructure automation, CI/CD pipelines, reporting, incident handling and response, and observability tools.
  2. Maintain core platform components, ensuring high availability, scalability, and security.
  3. Automate infrastructure provisioning, configuration management, and application deployments using tools like Terraform and Argo CD.
  4. Provide support and solve for platform-related issues, working closely with development teams to resolve problems.
  5. Implement and maintain security standard methodologies for the platform, ensuring compliance with industry standards.

Skills

Required

  • 5 years of hands-on experience working with large scale cloud infrastructure, automation, and overall DevOps methodologies
  • Bachelor's degree in a computer related field or equivalent work experience
  • Proficiency in infrastructure automation tools like Terraform
  • Experience with building, maintaining, and consuming CI/CD pipelines and tools like Argo CD
  • Strong analytical and problem-solving skills
  • Excellent communication and collaboration skills
  • Strong skills in Technical Writing Documentation for creating comprehensive technical documentation related to system architecture, operations, and reliability practices
  • Proven ability in Troubleshooting complex system issues

Nice to have

  • active TS/SCI w/CI Poly

What the JD emphasized

  • U.S. Federal Government
  • United States citizens
  • security clearance
  • TS/SCI w/CI Poly level
  • TS/SCI w/CI Poly is preferred