Sr Software Development Engineer, Ec2 Nitro Machine Learning Systems

Amazon Amazon · Big Tech · Seattle, WA · Software Development

Senior Software Development Engineer role focused on building and scaling machine learning infrastructure for EC2 Nitro, supporting training and inference workloads for various ML applications including LLMs and multimodal systems. The role involves designing innovative technologies, leading technical projects, developing regression testing systems, and collaborating with hardware teams to optimize platform designs for ML performance.

What you'd actually do

  1. Design and develop innovative technologies that power the infrastructure supporting machine learning workloads
  2. Lead technical projects establishing EC2 as the definitive source for ML performance best practices across diverse applications including LLMs, multimodal systems, and emerging model architectures
  3. Develop and maintain comprehensive regression testing systems that validate performance across major component releases including frameworks, firmware, drivers, and networking infrastructure
  4. Collaborate with hardware engineering teams to influence future platform designs based on performance insights gathered from state-of-the-art research and customer workloads
  5. Build customer relationships by investigating complex performance challenges, developing solutions, and publishing actionable best practices through multiple channels

Skills

Required

  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience as a mentor, tech lead or leading an engineering team
  • C, C++ or Rust development in a Linux environment
  • Linux package management
  • version control systems
  • automated build processes
  • software unit testing

Nice to have

  • In-depth knowledge of ML frameworks
  • cluster management
  • full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent
  • Experience in embedded development in C/C++ or Rust

What the JD emphasized

  • ML frameworks
  • cluster management

Other signals

  • ML workloads
  • LLMs
  • multimodal systems
  • training and inference