Senior Research Scientist, Multi-modal Language Models

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Research Scientist at NVIDIA focused on Multi-Modal Language Models, driving Nemotron technology. The role involves improving model abilities, generalization, and efficiency through data synthesis, retraining, and developing training recipes for mixed modalities (text, image, video, audio). It also includes translating research into production, exploring evaluation paradigms, and contributing to open-source communities.

What you'd actually do

  1. Driving new abilities into the model
  2. Improving generalization of existing functionalities by understanding weak points, designing a data synthesiis solution, and retraining models
  3. Developing recipes for training models that mix multiple modalities together, such as text, image, video, audio, etc …
  4. Design solutions that improve pareto efficiency
  5. Collaborating with researchers to translate cutting-edge ideas into production-ready implementations.

Skills

Required

  • PhD in Computer science, Electrical Engineering, or related field, or equivalent research experience in LLMs, systems, or related areas.
  • 4+ years of experiences in computer vision, especially multi-modal LLMs.
  • Proficiency in Python with hands-on experience in frameworks such as PyTorch.
  • Solid background in computer science fundamentals: algorithms, data structures, parallel/distributed computing, and systems programming.
  • Proven ability to collaborate across research and engineering teams in multifaceted environments.

Nice to have

  • Specific multi-modal LLM research experience
  • Experience developing and scaling large distributed systems for deep learning.
  • Contributions to open-source LLM systems or large-scale AI infrastructure

What the JD emphasized

  • multi-modal language models
  • retraining models
  • training models that mix multiple modalities

Other signals

  • multi-modal language models
  • open-source
  • state of the art
  • retraining models
  • training models that mix multiple modalities