Site Reliability Engineer, Discovery

Anduril Anduril · Defense · Washington, DC · AFS

Site Reliability Engineer for a defense technology company focused on AI-native offerings, autonomy, and networking. The role involves ensuring the operational capabilities of complex systems, including cloud, robotics, and mesh networking architectures, by improving core product offerings, creating management tooling, and resolving system issues. The engineer will work with various teams to deploy and support systems for customers, focusing on scalable and fault-tolerant delivery.

What you'd actually do

  1. Improve Anduril’s operational capabilities by improving our core product offering through root cause analysis and creating tooling capable of managing large scale deployments
  2. Drive continuous organizational improvement by leading post-mortem events involving diverse stakeholders
  3. Quickly diagnose and resolve system issues across cloud, robotics, and mesh networking architectures
  4. Lead the organization in building scalable, sustainable mechanisms to continue delivering to customers at the pace the business is scaling
  5. Design, develop, and deliver solutions using modern technologies that ensure scalable and fault tolerant delivery of systems to the warfighter

Skills

Required

  • U.S. Top Secret security clearance
  • STEM degree or equivalent technical experience
  • networking
  • cloud technologies
  • application development
  • hardware design
  • cybersecurity
  • 5 years of operations and engineering experience
  • IaC tools (Terraform, Ansible)
  • cloud platforms (Azure, AWS, GCP)
  • containerization (Docker)
  • container orchestration (Kubernetes)
  • understand and navigate complex systems and established code bases

Nice to have

  • drive consensus across internal and external stakeholders
  • developing and delivering solution to evolving problems in complex environments
  • developing and deploying autonomous weapon systems across command echelons
  • delivering and maintaining systems that run on air-gapped and security-hardened networks
  • building scalable solutions along with plans for implementation
  • data-driven root cause analysis on complex systems
  • understand, debug, and modify software written languages such as Go, Python, Rust, or C++
  • Excellent written and verbal communication skills

What the JD emphasized

  • Currently possesses and is able to maintain an active U.S. Top Secret security clearance
  • Minimum of 5 years of operations and engineering experience