Senior Software Engineer, Generative AI Research

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking a Senior Software Engineer for Generative AI Research to build and operate scalable infrastructure for training their world foundation model for physical AI, Cosmos. This role involves designing and developing high-throughput systems for data processing, retrieval, and workflow orchestration, improving system reliability and performance, and contributing to long-term infrastructure strategy for training, data management, and large-scale compute efficiency. The role requires a strong engineering background in distributed systems, ML infrastructure, or large-scale compute/data platforms, proficiency in Python and C++/Go/Rust, and experience with orchestration systems and data pipelines. Experience with large-scale model training infrastructure, distributed compute, synthetic data, or multimodal datasets is a plus.

What you'd actually do

  1. Design, build, and operate scalable infrastructure for training Cosmos and supporting large-scale data pipelines
  2. Develop high-throughput systems for data processing, retrieval, and workflow orchestration
  3. Collaborate across research, optimization, and platform teams to accelerate experiments and deployments
  4. Improve system reliability, performance, and observability across distributed compute environments
  5. Contribute to long-term infrastructure strategy for training, data management, and large-scale compute efficiency

Skills

Required

  • Masters Degree in Computer Science, Computer Engineering, related STEM Degree, or equivalent experience
  • 6 years of relevant work experience
  • Python
  • C++
  • Go
  • Rust
  • distributed systems
  • ML infrastructure
  • large-scale compute/data platforms
  • orchestration systems
  • scheduling
  • scalable storage
  • data pipelines
  • bridging research workflows and production-grade systems

Nice to have

  • Experience building or optimizing infrastructure for large-scale model training
  • Hands-on work with distributed compute environments or high-performance systems
  • Familiarity with synthetic data, simulation pipelines, or large multimodal datasets
  • Contributions to open-source infrastructure or large-scale internal tooling

What the JD emphasized

  • Masters Degree in Computer Science, Computer Engineering, related STEM Degree, or equivalent experience
  • 6 years of relevant work experience
  • Proficiency in Python and at least one systems language (e.g., C++/Go/Rust)
  • Experience with orchestration systems, scheduling, and scalable storage or data pipelines
  • Comfortable bridging research workflows and production-grade systems

Other signals

  • building systems that make it possible to train Cosmos
  • enables large-scale AI models for robots, autonomous agents, and AI systems to understand, plan, and act in complex environments
  • develops the Cosmos platform infrastructure that powers model training, data pipelines, simulation, and deployment at scale