Senior High Performance AI Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +5 · Remote

Senior High Performance AI Engineer to build multi-agent systems for the CUDA ecosystem, focusing on agentic runtimes, compiler-integrated orchestration, and GPU acceleration for agent workloads like planning, tool-use, and code generation. Collaborates across the AI stack from hardware to model/agent teams.

What you'd actually do

  1. Design, build and optimize agentic AI systems for the CUDA ecosystem.
  2. Co-design agentic system solutions with software, hardware and algorithm teams; influence and adopt new capabilities as they become available.
  3. Develop reproducible, high-fidelity evaluation frameworks covering performance, quality and developer productivity.
  4. Collaborate across the AI stack—from hardware through compilers/toolchains, kernels/libraries, frameworks, distributed training, and inference/serving—and with model/agent teams.

Skills

Required

  • AI systems development
  • building foundational models, agents or orchestration frameworks
  • deep learning frameworks
  • modern inference stacks
  • C/C++
  • Python
  • software engineering fundamentals
  • GPU programming
  • performance optimization
  • CUDA

Nice to have

  • MS or PhD
  • optimizing and deploying high-performance models
  • resource-constrained platforms
  • benchmark wins or published results
  • Publications or open-source leadership in deep learning, multi-agent systems, reinforcement learning, or AI systems
  • contributions to widely used repos or standards

What the JD emphasized

  • building groundbreaking multi-agent systems for the CUDA ecosystem
  • innovative agentic runtimes and compiler-integrated orchestration
  • accelerate agent planning, tool-use, code generation
  • Strong C/C++ and Python programming skills
  • Experience with GPU programming and performance optimization (CUDA or equivalent)
  • Track record building/evaluating deep learning models, coding agents and developer tooling
  • Demonstrated ability to optimize and deploy high-performance models
  • Deep expertise in GPU performance optimizations
  • Publications or open-source leadership in deep learning, multi-agent systems, reinforcement learning, or AI systems

Other signals

  • multi-agent systems
  • agentic runtimes
  • orchestration
  • foundational models
  • GPU acceleration