Software Engineer II - Ai/ml, Aws Neuron, LLM Inference, Ai/ml, Aws Neuron, Model Inference

Amazon Amazon · Big Tech · Cupertino, CA · Software Development

Software Engineer II role focused on optimizing LLM inference performance on AWS custom ML accelerators (Inferentia and Trainium) using the AWS Neuron SDK. This involves developing and tuning ML models and frameworks, building infrastructure for model onboarding, implementing low-level optimizations, and collaborating across hardware, software, and ML teams to ensure peak performance for customers.

What you'd actually do

  1. Design, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators.
  2. Participate in all stages of the ML system development lifecycle including distributed computing based architecture design, implementation, performance profiling, hardware-specific optimizations, testing and production deployment.
  3. Build infrastructure to systematically analyze and onboard multiple models with diverse architecture.
  4. Design and implement high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models
  5. Analyze and optimize system-level performance across multiple generations of Neuron hardware

Skills

Required

  • Python
  • System level programming
  • ML knowledge
  • performance profiling
  • hardware-specific optimizations
  • low-level optimization
  • system architecture

Nice to have

  • PyTorch
  • JAX
  • distributed computing
  • ML compiler
  • runtime
  • application framework
  • deep learning
  • GenAI workloads

What the JD emphasized

  • Experience optimizing inference performance for both latency and throughput on such large models across the stack from system level optimizations through to Pytorch or JAX is a must have.
  • Strong software development using Python, System level programming and ML knowledge are both critical to this role.

Other signals

  • AWS Neuron SDK
  • LLM Inference
  • ML accelerators
  • PyTorch
  • JAX
  • performance tuning
  • low-level optimization
  • system architecture