Machine Learning Engineer, Aws Neuron Inference, Annapurna ML

Amazon Amazon · Big Tech · Seattle, WA · Software Development

Machine Learning Engineer role focused on optimizing and tuning inference performance for AWS Neuron accelerators, specifically for large language models (LLMs) and other key ML model families. The role involves developing and performance tuning building blocks for the distributed inference library, ensuring high performance and efficiency on Trn2 and Trn3 servers. Requires experience with LLM inference optimization, kernels, Python, PyTorch, or JAX.

What you'd actually do

  1. develops, enables and performance tunes building blocks for all key ML model families, including Llama3, GPT OSS, Qwen3, DeepSeek and beyond.
  2. create, build and tune high-performance distributed inference solutions for the latest generation Trainium accelerators.
  3. develop technology components, you’ll create metrics, implement automation and other improvements, and resolve the root cause of software defects.
  4. participate in design discussions, code review, and communicate with internal and external stakeholders.
  5. work cross-functionally with teams across Neufon in a fast-paced startup-like development environment, where we constantly stay on top of the latest priorities as the AI landscape evolves.

Skills

Required

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience programming with at least one software programming language
  • Experience optimizing LLM inference performance with kernels, Python, PyTorch or JAX

Nice to have

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent

What the JD emphasized

  • Experience optimizing LLM inference performance with kernels, Python, PyTorch or JAX is a must

Other signals

  • AWS Inferentia and Trainium cloud-scale machine learning accelerators
  • performance tuning building blocks for all key ML model families
  • optimizing LLM inference performance with kernels, Python, PyTorch or JAX is a must