What you'd actually do

Understand, analyze, profile, and optimize AI training workloads on new hardware and software platforms, identifying fundamental performance limiters.

Prioritize and solve performance issues across the key AI model training tasks, with the goal of pushing the end-to-end performance towards the physical limits.

Implement production-quality software across multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.

Build and support NVIDIA submissions for MLPerf Training benchmarks.

Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.

What the JD emphasized

performance analysis and optimization

squeeze every last clock cycle

AI training

GPU architecture

application code

peak performance

hardware and software stack

performance limiters

end-to-end performance towards the physical limits

drivers to DL frameworks

MLPerf Training benchmarks

processor and system simulators

future architecture studies

automate workload analysis, optimization

PhD in CS, EE or CSEE (or equivalent experience) with 5+ years of relevant experience; or MS with 8+ years of experience.

Strong background in deep learning and neural networks, particularly in training.

Solid understanding of computer architecture and familiarity with GPU fundamentals.

Proven background in analyzing and tuning application performance.

Proven experience with processor and system-level performance modeling.

Proficiency in programming with C++, Python, and CUDA.

Other signals

Optimizing AI training workloads on new hardware and software platforms

Pushing end-to-end performance towards physical limits

Implementing production-quality software across multiple layers of NVIDIA's deep learning platform stack

Building and supporting NVIDIA submissions for MLPerf Training benchmarks

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

NVIDIA is looking for senior engineers who are obsessed with performance analysis and optimization to help us squeeze every last clock cycle out of AI training, the workload driving the design and construction of the largest and most powerful compute systems in the world. If you are willing to work across all layers of the hardware/software stack - from GPU architecture to the application code - to achieve peak performance, we want to hear from you. This role offers the opportunity to directly impact the hardware and software roadmap in a fast-growing technology company that leads the AI revolution. Join us and help design and build the world's most powerful compute systems!

What you will be doing:

Understand, analyze, profile, and optimize AI training workloads on new hardware and software platforms, identifying fundamental performance limiters.
Prioritize and solve performance issues across the key AI model training tasks, with the goal of pushing the end-to-end performance towards the physical limits.
Implement production-quality software across multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
Build and support NVIDIA submissions for MLPerf Training benchmarks.
Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
Develop tools to automate workload analysis, optimization, and other critical workflows.

What we want to see:

PhD in CS, EE or CSEE (or equivalent experience) with 5+ years of relevant experience; or MS with 8+ years of experience.
Strong background in deep learning and neural networks, particularly in training.
Solid understanding of computer architecture and familiarity with GPU fundamentals.
Proven background in analyzing and tuning application performance.
Proven experience with processor and system-level performance modeling.
Proficiency in programming with C++, Python, and CUDA.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until February 16, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

What you will be doing:

Understand, analyze, profile, and optimize AI training workloads on new hardware and software platforms, identifying fundamental performance limiters.
Prioritize and solve performance issues across the key AI model training tasks, with the goal of pushing the end-to-end performance towards the physical limits.
Implement production-quality software across multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
Build and support NVIDIA submissions for MLPerf Training benchmarks.
Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
Develop tools to automate workload analysis, optimization, and other critical workflows.

What we want to see:

PhD in CS, EE or CSEE (or equivalent experience) with 5+ years of relevant experience; or MS with 8+ years of experience.
Strong background in deep learning and neural networks, particularly in training.
Solid understanding of computer architecture and familiarity with GPU fundamentals.
Proven background in analyzing and tuning application performance.
Proven experience with processor and system-level performance modeling.
Proficiency in programming with C++, Python, and CUDA.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until February 16, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

Senior High-performance AI Training Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals