Machine Learning Performance Engineer, Annapurna Labs

Amazon Amazon · Big Tech · IL, Tel Aviv · Software Development

This role focuses on optimizing the performance of the AWS Neuron software stack, which supports Generative AI and ML workloads on AWS's custom ML accelerators (Inferentia and Trainium). The engineer will analyze ML workloads, develop high-performance kernels, enhance the Neuron SDK, and collaborate with compiler, frameworks, and hardware teams to maximize end-to-end performance. Responsibilities include instruction scheduling, memory management, parallelism, kernel optimization, and compiler enhancements, with a focus on ML inference and training performance.

What you'd actually do

  1. Optimizing system performance across the entire ML software stack
  2. Analyzing high-performance ML workloads running on Annapurna hardware
  3. Developing high-performance kernels for critical ML operations
  4. Enhancing the Neuron SDK to improve developer experience and system capabilities
  5. Collaborating across Compiler, Frameworks, and Hardware teams to maximize end-to-end performance

Skills

Required

  • Python
  • C++
  • TensorFlow
  • PyTorch
  • JAX
  • performance optimizations in LLM, Vision or other deep-learning models

Nice to have

  • developing algorithms for simulation tools
  • developing compiler optimization, kernel writing or hardware-software co-design

What the JD emphasized

  • performance optimizations in LLM, Vision or other deep-learning models
  • developing compiler optimization, kernel writing or hardware-software co-design

Other signals

  • AWS Neuron software stack
  • Generative AI and other advanced ML workloads
  • AWS's custom-built ML accelerators
  • ML inference and training
  • ML systems performance and software
  • high-performance kernels
  • compiler enhancements
  • instruction scheduling
  • memory management
  • parallelism
  • kernel optimization
  • hardware-software co-design