Site Reliability Engineer - Tactical Reconnaissance & Strike

Anduril Anduril · Defense · Atlanta, GA · Tactical Recon & Strike : Tactical Recon & Strike Engineering : TRS - Systems Engineering

Site Reliability Engineer for a defense technology company focusing on autonomous drones and rocket motors. Responsibilities include deploying and managing cloud environments, integrating platform services, maintaining data pipelines and observability infrastructure, and supporting field operations. The role requires strong skills in cloud platforms, CI/CD, IaC, containerization, and monitoring tools.

What you'd actually do

  1. Own and execute customer and developmental cloud deployments across TRS product lines, ensuring reliable configuration management, version control, and seamless promotion of releases from development through production environments.
  2. Evaluate, prototype, and integrate emerging platform capabilities (such as RDF and MissionSim) and/or 3rd party services (such as Arena AI and AFATDS/AXS) to improve data discoverability, consistency, and analytical capabilities across TRS systems.
  3. Maintain and enhance existing data pipelines, metrics frameworks, and monitoring solutions including Grafana and Nominal; ensure high availability, data quality, and actionable insights for engineering and operations teams.
  4. Collaborate directly with field operation teams during feature rollouts to conduct real-world testing, troubleshoot issues in operational environments, gather actionable feedback to inform system improvements and ensure mission success, and enable customer self-serve provisioning of environments.
  5. Partner with leadership to establish integration engineering functions and best practices across all TRS product lines, developing reusable patterns, documentation, and tooling that accelerate deployment capabilities and operational maturity.

Skills

Required

  • Python
  • CI/CD tools like GitHub Actions, Jfrog Artifactory, and Git
  • IaC tools (Terraform, Ansible)
  • cloud platforms (Azure, AWS, GCP)
  • containerization (Docker)
  • container orchestration (Kubernetes)
  • logging and monitoring tools (Nominal and Grafana)
  • parallel computing frameworks (CUDA, OpenCL)
  • collaborative tools (JIRA, Confluence)
  • U.S. Secret security clearance

Nice to have

  • networking
  • cloud technologies
  • application development
  • hardware design
  • cybersecurity

What the JD emphasized

  • Eligible to obtain and maintain an active U.S. Secret security clearance.