Software Engineer, Agent Platform

Anduril Anduril · Defense · Costa Mesa, CA +2 · AFS : Discovery Engineering : Discovery Engineering

Backend software engineer to own and evolve an internal LLM agent framework, enabling teams to develop, evaluate, and deploy reliable LLM agents in mission-critical defense environments. This role involves building platform capabilities, supporting ML teams with post-training workflows, and designing tooling for modern agent patterns and evaluation suites.

What you'd actually do

  1. Own Anduril’s internal LLM agent framework, including core abstractions, runtime architecture, developer experience, integrations, and reliability.
  2. Support multiple business lines building LLM agents by providing new framework capabilities, implementation guidance, architectural reviews, and best-practice patterns.
  3. Partner with machine learning teams to make model post-training workflows easy to integrate, ranging from supervised fine-tuning to offline RL, online RL, and environment-driven agent improvement.
  4. Design tooling that supports modern agent patterns, including structured tool calling, filesystem-using agents, memory and retrieval, planning loops, subagents, agent graphs, and human-in-the-loop workflows.
  5. Work with partner teams to define comprehensive evaluation suites that measure task success, tool-call correctness, trajectory quality, robustness, regressions, and deployment readiness.

Skills

Required

  • backend engineering
  • production-quality platforms
  • frameworks
  • APIs
  • infrastructure
  • LLM agent framework design
  • orchestration patterns
  • agent evaluation paradigms
  • model post-training workflows
  • reliability
  • observability
  • debugging
  • safety for LLM applications

Nice to have

  • agent frameworks like Langchain Deepagents, Claude SDK
  • evaluation platforms
  • simulation environments
  • benchmark suites
  • agent test harnesses
  • Kubernetes
  • Docker
  • distributed systems
  • workflow orchestration
  • ML infrastructure
  • defense
  • robotics
  • command-and-control systems
  • autonomy
  • operational planning domains

What the JD emphasized

  • Strong backend engineering experience building production-quality platforms, frameworks, APIs, or infrastructure used by other engineers.
  • Deep expertise in LLM agent framework design, including the tradeoffs between different orchestration patterns such as linear agents, graph-based agents, multi-agent systems, planner/executor loops, and tool-heavy agents.
  • Experience designing agent evaluation paradigms, including trajectory evaluations, LLM-as-judge workflows, task-success metrics, tool-call correctness checks, rubric-based qualitative grading, adversarial scenario testing, regression eval suites, and human-in-the-loop review.
  • Familiarity with model post-training workflows such as SFT, preference tuning, reinforcement learning, and environment-based agent training.
  • Strong judgment around reliability, observability, debugging, and safety for LLM applications deployed in high-stakes settings.

Other signals

  • LLM agent framework
  • agent architecture
  • model post-training
  • evaluation tooling
  • mission-critical environments