Software Development Engineer - Ai/ml, Amazon Neuron, Multimodal Inference

Amazon Amazon · Big Tech · Seattle, WA · Software Development

Software Development Engineer focused on optimizing and accelerating deep learning and GenAI workloads on AWS's custom ML accelerators (Inferentia and Trainium) through the AWS Neuron SDK. This role involves architecting, implementing, and tuning distributed inference solutions, focusing on performance optimization (latency and throughput) from system level to framework level (PyTorch, JAX). The engineer will work on low-level optimizations, system architecture, and ML model acceleration, collaborating across hardware, compiler, runtime, and framework teams.

What you'd actually do

  1. Design, develop, and optimize machine learning models and frameworks for deployment on custom ML hardware accelerators.
  2. Participate in all stages of the ML system development lifecycle including distributed computing based architecture design, implementation, performance profiling, hardware-specific optimizations, testing and production deployment.
  3. Build infrastructure to systematically analyze and onboard multiple models with diverse architecture.
  4. Design and implement high-performance kernels and features for ML operations, leveraging the Neuron architecture and programming models
  5. Analyze and optimize system-level performance across multiple generations of Neuron hardware

Skills

Required

  • Python
  • System level programming
  • ML knowledge
  • Experience optimizing inference performance for both latency and throughput on large models
  • System level optimizations
  • Pytorch or JAX optimization

Nice to have

  • deep hardware knowledge
  • ML expertise
  • distributed architectures
  • low-level optimization
  • system architecture
  • ML model acceleration
  • compiler
  • runtime
  • frameworks
  • kernels
  • performance profiling
  • hardware-specific optimizations
  • fusion
  • sharding
  • tiling
  • scheduling
  • unit and end-to-end model testing
  • continuous deployment
  • pipelines
  • applied scientists
  • product managers
  • debugging performance issues
  • optimizing memory usage
  • software architecture
  • metrics
  • automation
  • root cause analysis
  • software defects

What the JD emphasized

  • critical to this role
  • must have
  • required

Other signals

  • AWS Neuron SDK
  • ML compiler, runtime, and application framework
  • accelerating deep learning and GenAI workloads
  • maximizing performance for AWS's custom ML accelerators
  • optimizing inference performance for both latency and throughput
  • distributed inference solutions
  • low-level optimization, system architecture, and ML model acceleration
  • building distributed inference support for Pytorch in the Neuron SDK
  • tune these models to ensure highest performance and maximize the efficiency
  • hardware-specific optimizations