Software Engineer, Training Platform

Anduril Anduril · Defense · Costa Mesa, CA +2 · AFS : Discovery Engineering : Discovery Engineering

Software Engineer to build the Frontier AI Agent Platform, an internal product for training, evaluating, observing, comparing, and deploying AI agents. The role involves full-stack development, API design, workflow creation for agent evaluation, and building observability tools. It sits at the intersection of product engineering, ML infrastructure, and evaluation tooling within the defense technology sector.

What you'd actually do

  1. Build full-stack applications for Anduril’s Frontier AI Agent Platform, including frontend interfaces, backend APIs, workflows, dashboards, metrics, and reporting tools.
  2. Create user-facing applications for ML researchers, software engineers, test and evaluation teams, and technical program stakeholders.
  3. Design and implement backend APIs for launching training runs, configuring evaluations, tracking experiments, observing agent behavior, and comparing performance across models, agents, policies, and baselines.
  4. Build clear, reliable workflows for evaluating AI agents across simulations, test harnesses, mission scenarios, and production-like environments.
  5. Develop metrics and reporting interfaces that make complex evaluation results understandable, actionable, and trustworthy.

Skills

Required

  • Strong full-stack engineering experience building production web applications, internal tools, or developer platforms.
  • Proven experience developing modern web user interfaces using React and JavaScript or TypeScript.
  • Experience designing and implementing backend APIs, including REST, gRPC, or similar service interfaces.
  • Ability to build user-friendly metrics, dashboards, and reporting workflows that accurately represent complex evals, training runs, experiments, and deployments.
  • Strong product judgment: able to turn messy technical workflows into clear interfaces that users can trust.
  • Ability to work closely with ML researchers, software engineers, and test and evaluation teams to understand their workflows and build reusable platform capabilities.
  • Strong engineering fundamentals around reliability, observability, data modeling, API design, and maintainable system architecture.

Nice to have

  • Experience building ML platforms, experiment tracking tools, evaluation dashboards, simulation tooling, or internal developer platforms.
  • Familiarity with LLM agents, reinforcement learning, post-training workflows, model evaluation, or agent observability.
  • Experience with metrics design, data visualization, experiment comparison, or decision-support tooling.
  • Experience with Kubernetes, Docker, workflow orchestration, distributed systems, or cloud infrastructure.
  • Familiarity with defense, robotics, autonomy, command-and-control, test and evaluation, or operational planning domains.

What the JD emphasized

  • build the Frontier AI Agent Platform
  • train, evaluate, observe, compare, and deploy AI agents
  • ML infrastructure
  • evaluation tooling
  • operational reporting
  • evaluating AI agents
  • agent observability

Other signals

  • building agent platform
  • training, evaluating, observing, comparing, and deploying AI agents
  • ML infrastructure
  • evaluation tooling
  • operational reporting