Senior Production Engineer

Anduril Anduril · Defense · Costa Mesa, CA +1 · AFS : Counter Intrusion Engineering : Software Engineering

Senior Production Engineer role focused on diagnosing and fixing stability vulnerabilities in core platform services for a defense technology company. The role involves implementing resilience patterns in production Go code and debugging complex failure modes across service boundaries. Requires strong experience with distributed systems and Kubernetes.

What you'd actually do

  1. Diagnose and fix stability vulnerabilities in core platform services that cause cascading failures under multi-replica, multi-tenant operation
  2. Implement resilience patterns (leader election, circuit breakers, failure domain isolation) directly in service code
  3. Design multi-replica support for services that currently assume single-instance operation
  4. Collaborate with service owners on contract testing and upgrade validation
  5. Trace cascading failures across service boundaries and drive them to root-cause fixes

Skills

Required

  • Production Go
  • distributed systems
  • Kubernetes
  • debugging complex systems

Nice to have

  • Rust
  • reliability problems in production services
  • gRPC service architectures
  • HashiCorp Consul
  • FedRAMP/IL5 compliance environment experience
  • ArgoCD / GitOps workflows

What the JD emphasized

  • Production-quality Go
  • distributed systems
  • debugging complex systems
  • U.S. Person