Sr. Software Engineer- Ai/ml, Aws Neuron Apps

Amazon Amazon · Big Tech · Seattle, WA · Software Development

Senior Software Engineer role focused on optimizing and deploying large AI models (LLMs, vision generative AI) on AWS's custom AI accelerators (Inferentia, Trainium). The role involves architecting distributed inference solutions, optimizing performance from high-level frameworks to hardware implementations, and developing tools for LLM accuracy and efficiency. It bridges ML frameworks (PyTorch, JAX) with AI hardware, focusing on inference performance and scaling.

What you'd actually do

  1. Spearhead distributed inference architecture for PyTorch and JAX using XLA
  2. Engineer breakthrough performance optimizations for AWS Trainium and Inferentia
  3. Develop ML tools to enhance LLM accuracy and efficiency
  4. Transform complex tensor operations into highly optimized hardware implementations
  5. Pioneer benchmarking methodologies that shape next-gen AI accelerator design

Skills

Required

  • Python
  • ML framework internals
  • distributed systems
  • ML optimization
  • performance tuning
  • system architecture
  • AI acceleration via quantization, parallelism, model compression, batching, KV caching, vllm serving
  • accuracy debugging & tooling
  • performance benchmarking of AI accelerators
  • Machine learning and deep learning models, their architecture, training and inference lifecycles
  • optimizations for improving model execution

Nice to have

  • PyTorch
  • JAX
  • XLA
  • CUDA kernels
  • HPC
  • inference optimization
  • tensor operations

What the JD emphasized

  • pioneer distributed inference solutions
  • optimize breakthrough language and vision generative AI models
  • drive performance benchmarking and tuning
  • architect the bridge between ML frameworks including PyTorch, JAX and AI hardware
  • engineer breakthrough performance optimizations
  • develop ML tools to enhance LLM accuracy and efficiency
  • transform complex tensor operations into highly optimized hardware implementations
  • pioneer benchmarking methodologies that shape next-gen AI accelerator design
  • full-stack optimization from high-level frameworks to hardware-specific primitives
  • creation of tools and frameworks that define industry standards for ML deployment
  • Experience with AI acceleration via quantization, parallelism, model compression, batching, KV caching, vllm serving
  • Experience with accuracy debugging & tooling, performance benchmarking of AI accelerators
  • Fundamentals of Machine learning and deep learning models, their architecture, training and inference lifecycles along with work experience on optimizations for improving the model execution

Other signals

  • deploying and optimizing some of the world's most sophisticated AI models at unprecedented scale
  • pioneer distributed inference solutions
  • optimize breakthrough language and vision generative AI models
  • drive performance benchmarking and tuning
  • architect the bridge between ML frameworks including PyTorch, JAX and AI hardware
  • engineer breakthrough performance optimizations
  • develop ML tools to enhance LLM accuracy and efficiency
  • transform complex tensor operations into highly optimized hardware implementations
  • pioneer benchmarking methodologies that shape next-gen AI accelerator design
  • full-stack optimization from high-level frameworks to hardware-specific primitives
  • creation of tools and frameworks that define industry standards for ML deployment
  • experience with AI acceleration via quantization, parallelism, model compression, batching, KV caching, vllm serving
  • experience with accuracy debugging & tooling, performance benchmarking of AI accelerators
  • fundamentals of Machine learning and deep learning models, their architecture, training and inference lifecycles along with work experience on optimizations for improving the model execution