Senior Research Scientist, Nemotron Post-training

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Research Scientist/Engineer at NVIDIA focused on building Nemotron models, specifically working on post-training pipelines, synthetic data, agentic RL, data/training infrastructure, and large-scale model post-training. The role involves advancing open-source foundation models, developing training data, benchmarks, LLMs, and software, and solving end-to-end foundation model post-training challenges. Requires a Master's/PhD and 5+ years of experience in model post-training, RL, and agentic systems, with experience in data curation, model training, and inference/deployment environments.

What you'd actually do

  1. You will be engaged as core contributors to Nemotron models post-training, working at the intersection of the areas: 1) Synthetic data and algorithmic research for agentic RL 2) Data and training Infrastructure implementation 3) Collaborating in vendor data acquisition and experimentation 4) Large-scale research & production model post-training
  2. Advance open-source foundation models by developing training data, benchmarks, LLMs and software (including [NeMo-RL](https://github.com/NVIDIA-NeMo/RL), Nemo-Gym and yet to be announced software)
  3. Solve large-scale, end-to-end foundation model post-training challenges, spanning the full model lifecycle from initial orchestration, data pre-processing, running of model training and tuning, to model deployment.
  4. Publish and present your results at academic and industry conferences

Skills

Required

  • Master or PhD degrees in computer science, machine learning or other quantitative domains (or equivalent experience)
  • 5+ year working or research experience in model mid-training / post-training, reinforcement learning and agentic systems
  • Hands-on experience in data curation and model training for Agentic and Reasoning capabilities
  • In-depth experience in using or developing inference and deployment environments such as vLLM, SGLang or TRT-LLM

Nice to have

  • Industrial experience in reinforcement learning for leading foundation models
  • Experience in optimizing model quality from real-world traffic feedbacks

What the JD emphasized

  • 5+ year working or research experience in model mid-training / post-training, reinforcement learning and agentic systems
  • Hands-on experience in data curation and model training for Agentic and Reasoning capabilities
  • In-depth experience in using or developing inference and deployment environments such as vLLM, SGLang or TRT-LLM

Other signals

  • post-training pipelines
  • foundation models
  • open-source generative AI
  • agentic RL
  • large-scale research & production model post-training