Senior High-performance AI Training Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior engineer focused on optimizing AI training workloads for performance on NVIDIA's hardware and software stack, from drivers to DL frameworks, impacting hardware/software roadmap and contributing to MLPerf benchmarks.

What you'd actually do

  1. Understand, analyze, profile, and optimize AI training workloads on new hardware and software platforms, identifying fundamental performance limiters.
  2. Prioritize and solve performance issues across the key AI model training tasks, with the goal of pushing the end-to-end performance towards the physical limits.
  3. Implement production-quality software across multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
  4. Build and support NVIDIA submissions for MLPerf Training benchmarks.
  5. Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.

Skills

Required

  • C++
  • Python
  • CUDA
  • Deep learning training
  • Computer architecture
  • GPU fundamentals
  • Application performance analysis and tuning
  • Processor and system-level performance modeling

Nice to have

  • PhD in CS, EE or CSEE (or equivalent experience)
  • 5+ years of relevant experience
  • MS with 8+ years of experience

What the JD emphasized

  • performance analysis and optimization
  • squeeze every last clock cycle
  • AI training
  • GPU architecture
  • application code
  • peak performance
  • hardware and software stack
  • performance limiters
  • end-to-end performance towards the physical limits
  • drivers to DL frameworks
  • MLPerf Training benchmarks
  • processor and system simulators
  • future architecture studies
  • automate workload analysis, optimization
  • PhD in CS, EE or CSEE (or equivalent experience) with 5+ years of relevant experience; or MS with 8+ years of experience.
  • Strong background in deep learning and neural networks, particularly in training.
  • Solid understanding of computer architecture and familiarity with GPU fundamentals.
  • Proven background in analyzing and tuning application performance.
  • Proven experience with processor and system-level performance modeling.
  • Proficiency in programming with C++, Python, and CUDA.

Other signals

  • Optimizing AI training workloads on new hardware and software platforms
  • Pushing end-to-end performance towards physical limits
  • Implementing production-quality software across multiple layers of NVIDIA's deep learning platform stack
  • Building and supporting NVIDIA submissions for MLPerf Training benchmarks