Senior Software Development Engineer - Ai/ml, Aws Neuron, Multimodal Inference

Amazon Amazon · Big Tech · Seattle, WA · Software Development

Senior Software Development Engineer for AWS Neuron, focusing on accelerating deep learning and GenAI workloads on Amazon's custom ML accelerators (Inferentia and Trainium). The role involves designing, developing, and optimizing ML models and frameworks for deployment, with a strong emphasis on distributed inference, performance tuning (latency and throughput), and system-level optimizations for LLMs.

What you'd actually do

  1. Design, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators.
  2. Participate in all stages of the ML system development lifecycle including distributed computing based architecture design, implementation, performance profiling, hardware-specific optimizations, testing and production deployment.
  3. Build infrastructure to systematically analyze and onboard multiple models with diverse architecture.
  4. Design and implement high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models
  5. Analyze and optimize system-level performance across multiple generations of Neuron hardware

Skills

Required

  • Python
  • System level programming
  • ML knowledge
  • distributed computing
  • performance profiling
  • hardware-specific optimizations
  • testing
  • production deployment
  • high-performance kernels
  • Neuron architecture
  • programming models
  • system-level performance analysis
  • optimization techniques
  • debugging performance issues
  • optimizing memory usage
  • software architecture

Nice to have

  • PyTorch
  • JAX
  • compiler
  • runtime
  • collectives
  • future architecture designs
  • Generative AI applications
  • Open Source Community
  • automation
  • software defects

What the JD emphasized

  • critical to this role
  • must have
  • performance tuning
  • optimize inference performance
  • system level optimizations
  • high-performance kernels
  • optimize machine learning workloads
  • low-level optimization
  • system architecture
  • ML model acceleration
  • performance profiling
  • hardware-specific optimizations
  • performance analysis
  • optimize system-level performance
  • performance issues
  • optimizing memory usage
  • optimize machine learning workloads

Other signals

  • AWS Neuron SDK
  • ML accelerators
  • Inference Enablement and Acceleration
  • LLM model families
  • distributed inference