ML Kernel Performance Engineer, Aws Neuron, Annapurna Labs

Amazon Amazon · Big Tech · Cupertino, CA · Software Development

The role focuses on optimizing ML kernel performance for AWS Neuron SDK on custom ML accelerators (Inferentia and Trainium). It involves designing and implementing high-performance compute kernels, analyzing and optimizing kernel-level performance, implementing compiler optimizations, and collaborating with customers and internal teams to enable and optimize ML models. The work is at the hardware-software boundary, combining deep hardware knowledge with ML expertise.

What you'd actually do

  1. Design and implement high-performance compute kernels for ML operations, leveraging the Neuron architecture and programming models
  2. Analyze and optimize kernel-level performance across multiple generations of Neuron hardware
  3. Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks
  4. Implement compiler optimizations such as fusion, sharding, tiling, and scheduling
  5. Work directly with customers to enable and optimize their ML models on AWS accelerators

Skills

Required

  • low-level optimization
  • system architecture
  • ML model acceleration
  • high-performance computing
  • distributed architectures
  • compiler optimizations (fusion, sharding, tiling, scheduling)
  • profiling tools
  • deep hardware knowledge
  • ML expertise

Nice to have

  • experience with AWS Neuron SDK
  • experience with PyTorch
  • experience with Inferentia and Trainium accelerators

What the JD emphasized

  • high-performance kernels
  • ML functions
  • AI acceleration
  • ML compiler
  • ML inference
  • ML model acceleration
  • ML workloads
  • ML accelerators
  • machine learning
  • ML models

Other signals

  • AWS Neuron SDK
  • accelerate deep learning and GenAI workloads
  • custom machine learning accelerators
  • high-performance kernels for ML functions
  • push the boundaries of what's possible in AI acceleration