ML Kernel Performance Engineer, Aws Neuron, Annapurna Labs

Amazon Amazon · Big Tech · CA, ON +1 · Software Development

The role focuses on optimizing the performance of machine learning kernels for AWS's custom ML accelerators (Inferentia and Trainium) by developing and implementing high-performance compute kernels, optimizing compiler optimizations, and analyzing kernel-level performance. This involves working at the hardware-software boundary to ensure optimal performance for deep learning and GenAI workloads.

What you'd actually do

  1. Design and implement high-performance compute kernels for ML operations, leveraging the Neuron architecture and programming models
  2. Analyze and optimize kernel-level performance across multiple generations of Neuron hardware
  3. Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks
  4. Implement compiler optimizations such as fusion, sharding, tiling, and scheduling
  5. Work directly with customers to enable and optimize their ML models on AWS accelerators

Skills

Required

  • low-level optimization
  • system architecture
  • ML model acceleration
  • performance analysis
  • compiler optimizations
  • C/C++
  • Python

Nice to have

  • deep hardware knowledge
  • ML expertise
  • experience with ML frameworks (PyTorch)
  • distributed systems

What the JD emphasized

  • high-performance kernels for ML functions
  • optimal performance
  • push the boundaries of what's possible in AI acceleration
  • high-performance computing
  • distributed architectures
  • cutting-edge research
  • optimize machine learning workloads
  • low-level optimization
  • ML model acceleration

Other signals

  • AWS Neuron SDK
  • Inferentia and Trainium ML accelerators
  • ML compiler, runtime, and application framework
  • optimize machine learning workloads
  • high-performance compute kernels for ML operations