Software Development Engineer - Ai/ml, Aws Neuron

Amazon Amazon · Big Tech · Cupertino, CA · Software Development

Software Development Engineer focused on optimizing and enabling deep learning and GenAI workloads, specifically LLMs, on AWS's custom ML accelerators (Neuron SDK, Inferentia, Trainium). The role involves system-level and low-level optimizations for inference performance, working across frameworks, kernels, and hardware boundaries.

What you'd actually do

  1. Design, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators.
  2. Participate in all stages of the ML system development lifecycle including distributed computing based architecture design, implementation, performance profiling, hardware-specific optimizations, testing and production deployment.
  3. Build infrastructure to systematically analyze and onboard multiple models with diverse architecture.
  4. Design and implement high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models
  5. Analyze and optimize system-level performance across multiple generations of Neuron hardware

Skills

Required

  • Python
  • System level programming
  • ML knowledge
  • low-level optimization
  • system architecture
  • ML model acceleration
  • performance profiling
  • hardware-specific optimizations
  • testing
  • production deployment
  • distributed computing
  • ML operations
  • Neuron architecture
  • Neuron programming models
  • performance analysis
  • profiling tools
  • fusion
  • sharding
  • tiling
  • scheduling
  • unit testing
  • end-to-end model testing
  • continuous deployment
  • releases through pipelines
  • customer enablement
  • optimization expertise

Nice to have

  • PyTorch
  • JAX
  • compiler
  • runtime
  • collectives
  • future architecture designs
  • open source ecosystems

What the JD emphasized

  • critical
  • must have
  • optimize inference performance for both latency and throughput on such large models across the stack from system level optimizations through to Pytorch or JAX is a must have.

Other signals

  • AWS Neuron SDK
  • ML accelerators (Inferentia, Trainium)
  • LLM model families
  • inference performance optimization
  • low-level optimization
  • system architecture