Senior Research Scientist, Multimodal Foundation Models and Robotics

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Research Scientist role focused on building multimodal foundation models and systems for humanoid robots and embodied agents, involving algorithm design, large-scale training/inference, and deployment on physical hardware and simulations.

What you'd actually do

  1. Design and implement novel AI algorithms and models for general-purpose humanoid robots and embodied agents;
  2. Develop large-scale AI training and inference methods for foundation models;
  3. Optimize and deploy AI models in physical simulation and on robot hardware;
  4. Collaborate with research and engineering teams across all of NVIDIA to transfer research to products and services.

Skills

Required

  • Ph.D. in Computer Science/Engineering, Electrical Engineering, etc., or equivalent research experience
  • 5 years of relevant work/research experience
  • Hands-on training experience and publications in multimodal foundation models (LLMs, vision-language models, video generative models, diffusion algorithms, action-based transformers)
  • Outstanding engineering skills in rapid prototyping and model training frameworks (PyTorch, Jax, Tensorflow, etc.)
  • Python is required
  • Excellent skills in working with large-scale machine learning/AI systems and compute infrastructure
  • Hands-on training experience and publications in robot learning (reinforcement learning, imitation learning, classical control methods)
  • Strong programming skills in Python, C++, ROS, and machine learning frameworks like PyTorch
  • Deep understanding of robot kinematics, dynamics, and sensors
  • Ability to safely operate robot hardware, lab equipment, and tools
  • Knowledge of control methods, including PID, model predictive control, and whole-body control
  • Familiarity with physics simulation frameworks such as MuJoCo and Isaac Sim
  • Robot hardware design and hands-on building experience

Nice to have

  • C++ and CUDA proficiencies
  • Robot hardware design and hands-on building experience

What the JD emphasized

  • Hands-on training experience and publications in at least one of the following topics: LLMs; Large vision-language models; Video generative models and diffusion algorithms; or Action-based transformers.
  • Hands-on training experience and publications in robot learning, such as reinforcement learning, imitation learning, classical control methods, etc.

Other signals

  • humanoid robot foundation models
  • general-purpose embodied agents
  • large-scale robot learning
  • multimodal foundation models