Senior/staff Deep Reinforcement Learning Engineer

DoorDash DoorDash · Consumer · San Francisco, CA · 311 Autonomy

Senior/Staff Deep RL Engineer to design, train, and deploy deep reinforcement learning policies for real-time driving decisions in autonomous vehicles. The role involves full lifecycle ownership from problem formulation and reward design to large-scale distributed training and on-vehicle inference, with a focus on JAX end-to-end stack for rapid deployment.

What you'd actually do

  1. Formulate complex driving tasks as RL problems with well-shaped reward functions and expressive state/action representations.
  2. Design and train model-based deep RL agents using GPU-accelerated simulation at massive scale, including improving the simulator itself.
  3. Build and maintain distributed training infrastructure in JAX across large compute clusters.
  4. Build agentic optimization systems that automatically improve code, run experiments, analyze metrics, and iterate on RL policies with minimal human intervention.

Skills

Required

  • BS/MS/PhD in CS, EE, Robotics, or a related field
  • strong foundation in reinforcement learning and deep learning
  • Hands-on experience training RL agents at scale
  • Proficiency in JAX or a similar functional ML framework
  • comfort with JIT compilation, vectorized environments, and GPU-accelerated simulation
  • Deep grasp of core RL concepts: policy gradients, value functions, exploration-exploitation, model-based RL, reward shaping, and sim-to-real transfer
  • Data-driven mindset: comfortable building experiment pipelines, analyzing training runs, and letting metrics guide architectural decisions

Nice to have

  • Publications at top venues (NeurIPS, ICML, ICLR, CoRL, RSS, ICRA) on RL or learned planning
  • Experience building or working with GPU-accelerated simulators for RL training
  • Track record of shipping a learned component in a production robotics or autonomous vehicle stack

What the JD emphasized

  • deep reinforcement learning
  • real-time driving decisions
  • autonomous vehicles
  • full lifecycle
  • reward design
  • large-scale distributed training
  • on-vehicle inference
  • JAX
  • robotics
  • autonomous driving
  • real-time decision-making
  • GPU-accelerated simulation
  • shipping a learned component in a production robotics or autonomous vehicle stack

Other signals

  • deep reinforcement learning
  • autonomous vehicles
  • real-time driving decisions
  • full lifecycle ownership
  • large-scale distributed training
  • on-vehicle inference