Agent RL Infra Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking an engineer to develop and productionize reinforcement learning (RL) capabilities for agent teams within an enterprise context. The role involves evaluating and adapting RL approaches, designing reward environments, operationalizing training backends, and integrating with existing ML services. Responsibilities include leading data curation, designing RL training loops, integrating with GPU infrastructure, building observability, and collaborating with various platform and customer teams. The ideal candidate has extensive experience in operationalizing fine-tuning and RL techniques, familiarity with distributed training frameworks and MLOps, and proficiency in relevant programming languages.

What you'd actually do

  1. Evaluate and adapt democratized RL approaches into reusable cookbooks and blueprints so agent developers can integrate self-improvement loops (GRPO, DPO, PPO, RLAIF) on their own
  2. Design verifiable reward environments building on NeMo Gym, extending to domain-specific environments for internal use cases
  3. Operationalize NVIDIA and third-party training backends as production services inside Sandbox
  4. Integrate with NeMo Microservices (Curator, Customizer, Evaluator, Guardrails) to enable end-to-end data flywheel workflows for RL
  5. Lead data curation and active learning strategies to continuously improve training data quality

Skills

Required

  • MS in CS, ML, or related field (or equivalent experience)
  • 10+ years of experience
  • Experience operationalizing fine-tuning methods (LoRA, SFT) and especially RL techniques (DPO, GRPO, PPO, RLAIF) into reusable cookbooks and self-service workflows
  • Familiarity with distributed training frameworks (e.g., Megatron, NeMo, DeepSpeed, FSDP, HF Accelerate) and ML ops skills covering pipeline automation, job orchestration, and GPU cluster management are important here
  • Proficiency in Python, Go, Rust, or similar
  • Background in CS, ML, or related field through formal education or equivalent experience

Nice to have

  • Building RL environments or training recipes that other teams consumed as self-service capabilities
  • Familiarity with NVIDIA infrastructure (DGX, AI Factory, NVLink/InfiniBand), NeMo Microservices, or the evolving RL-for-agents ecosystem (rLLM, Agent Lightning, HUD, OpenRLHF, SkyRL)
  • Experience with data curation, active learning, continuous learning loops, or data flywheel architectures

What the JD emphasized

  • operationalizing fine-tuning methods (LoRA, SFT) and especially RL techniques (DPO, GRPO, PPO, RLAIF) into reusable cookbooks and self-service workflows
  • distributed training frameworks (e.g., Megatron, NeMo, DeepSpeed, FSDP, HF Accelerate) and ML ops skills covering pipeline automation, job orchestration, and GPU cluster management are important here
  • Proficiency in Python, Go, Rust, or similar

Other signals

  • reinforcement learning
  • agent teams
  • ML research and production engineering
  • enterprise-ready blueprints
  • sandboxed execution environments
  • security and governance