Senior AI Frameworks Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

NVIDIA is seeking a Senior AI Frameworks Engineer to contribute to the CUTLASS project, focusing on developing a Pythonic interface for high-performance GPU computations. The role involves designing APIs, building compilation infrastructure, optimizing developer experience, and managing production-grade delivery for the open-source community.

What you'd actually do

  1. Design APIs that prioritize user productivity, providing a "native" feel for developers accustomed to modern scientific computing and deep learning frameworks.
  2. Develop robust compilation infrastructure—including AST transformations and JIT-friendly execution—to lower Pythonic descriptions into high-performance GPU machine code.
  3. Optimize developer experience by creating debugging tools, profiler integrations, and validation methodologies that make writing and using kernels easy.
  4. Build production-grade delivery infrastructure for the open-source community, managing everything from package distribution (wheels, conda) to the user-facing documentation and testing.

Skills

Required

  • Python
  • C++
  • Python extensions
  • foreign function interfaces (FFI)
  • library or framework development
  • API design
  • compilation infrastructure
  • AST transformations
  • JIT execution
  • debugging tools
  • profiler integrations
  • validation methodologies
  • package distribution
  • documentation
  • testing

Nice to have

  • Active maintainer status or significant contributions to high-performance open-source libraries
  • AI frameworks
  • compiler projects (LLVM/MLIR)
  • compiler foundations
  • intermediate representations (IR)
  • lowering passes
  • AST manipulation
  • GPU Architecture
  • parallel programming models (CUDA)

What the JD emphasized

  • MS or PhD degree in Computer Science, Electrical Engineering, or related field (or equivalent experience).
  • At least 3+ years of relevant experience.
  • Strong proficiency in Python and C++, specifically regarding the design of Python extensions and foreign function interfaces (FFI).
  • Experience in library or framework development, with a focus on creating intuitive APIs for complex technical systems.
  • Deep understanding of the Python ecosystem’s delivery stack, including building, testing, and distributing high-performance compiled extensions.

Other signals

  • building the next frontier of this ecosystem
  • Pythonic CUTLASS (CUTLASS DSL)
  • bridge the gap between low-level hardware primitives and high-level developer productivity
  • high-performance math primitives
  • NVIDIA GPUs