Software Engineer Ii- Ai/ml, Aws Neuron

Amazon Amazon · Big Tech · Seattle, WA · Software Development

Software Engineer II role focused on optimizing and enabling deep learning and GenAI workloads on AWS custom ML accelerators (Inferentia and Trainium) by developing and enhancing the AWS Neuron SDK. This involves working across the stack from frameworks like PyTorch/JAX to hardware-software boundaries, optimizing ML compilers, runtimes, and high-performance kernels for inference and training. The role requires strong software development skills in Python/C++, system-level programming, ML knowledge, and collaboration with various teams to ensure optimal performance for customers.

What you'd actually do

  1. Design, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators.
  2. Participate in all stages of the ML system development lifecycle including distributed computing based architecture design, implementation, performance profiling, hardware-specific optimizations, testing and production deployment.
  3. Build infrastructure to systematically analyze and onboard multiple models with diverse architecture.
  4. Analyze and optimize system-level performance across multiple generations of Neuron hardware
  5. Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks

Skills

Required

  • Python
  • C++
  • System level programming
  • ML knowledge
  • software development
  • distributed computing
  • performance profiling
  • hardware-specific optimizations
  • testing
  • production deployment
  • performance analysis
  • profiling tools
  • debugging performance issues
  • optimizing memory usage
  • software architecture
  • automation
  • root cause analysis
  • design discussions
  • code review

Nice to have

  • PyTorch
  • JAX
  • LLM
  • Generative AI
  • AWS Neuron SDK
  • Inferentia
  • Trainium
  • ML compiler
  • runtime
  • kernels
  • high-performance computing
  • distributed architectures
  • inference capabilities
  • Open Source Community

What the JD emphasized

  • critical to this role
  • optimize
  • performance
  • accelerators
  • hardware
  • software
  • machine learning
  • inference
  • training

Other signals

  • ML compiler
  • runtime
  • application framework
  • PyTorch
  • JAX
  • ML inference
  • training performance
  • LLM model families
  • distributed training
  • AWS Trainium silicon
  • hardware-software boundary
  • kernels for ML functions
  • AI acceleration
  • high-performance computing
  • distributed architectures
  • inference capabilities
  • Generative AI applications
  • Open Source Community