Software Engineering Manager, ML Kernel Performance, Aws Neuron, Annapurna Labs

Amazon Amazon · Big Tech · Cupertino, CA · Software Development

The Annapurna Labs team at AWS is seeking an Engineering Manager to lead a team focused on optimizing ML kernel performance for AWS Neuron, their custom ML accelerators (Inferentia and Trainium). The role involves designing and implementing high-performance kernels, optimizing compiler and runtime performance, and working closely with customers to enable their ML models. This position operates at the hardware-software boundary, combining deep hardware knowledge with ML expertise to accelerate deep learning and GenAI workloads.

What you'd actually do

  1. Design and implement high-performance compute kernels for ML operations, leveraging the Neuron architecture and programming models
  2. Analyze and optimize kernel-level performance across multiple generations of Neuron hardware
  3. Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks
  4. Implement compiler optimizations such as fusion, sharding, tiling, and scheduling
  5. Work directly with customers to enable and optimize their ML models on AWS accelerators
  6. Collaborate across teams to develop innovative kernel optimization techniques

Skills

Required

  • Software development management
  • Deep learning frameworks (PyTorch)
  • High-performance computing
  • Low-level optimization
  • System architecture
  • ML model acceleration
  • Compiler optimizations (fusion, sharding, tiling, scheduling)
  • Performance analysis and profiling
  • Customer engagement for model enablement

Nice to have

  • Experience with AWS Neuron SDK
  • Experience with AWS Inferentia and Trainium accelerators
  • Knowledge of ML compilers and runtimes
  • Experience with distributed architectures

What the JD emphasized

  • maximizing performance
  • high-performance kernels
  • optimal performance
  • push the boundaries
  • optimal performance
  • high-performance computing
  • cutting-edge research
  • optimal performance

Other signals

  • deep learning
  • GenAI
  • ML accelerators
  • high-performance kernels
  • ML compiler
  • runtime
  • inference
  • training