Software Development Manager , Ec2 Nitro

Amazon Amazon · Big Tech · Seattle, WA · Software Development

This role leads a performance engineering group focused on optimizing ML infrastructure for EC2, covering diverse workloads like LLMs and multimodal models. The responsibilities include building and managing a team, driving architectural decisions, developing performance measurement infrastructure, and establishing regression coverage across the ML systems stack from low-level optimization to serving layers.

What you'd actually do

  1. Lead the design, implementation, and delivery of foundational ML performance measurement infrastructure that operates as reliable CI/CD systems across diverse accelerator platforms
  2. Build and nurture a high-performing engineering team focused on establishing EC2's source for ML performance best-known-configurations
  3. Drive architectural decisions that influence future platform design by feeding insights from state-of-the-art research and customer workloads into accelerated platform launches
  4. Develop comprehensive regression coverage across all major component releases including frameworks, firmware, drivers, and networking components
  5. Establish mechanisms to scale performance engineering practices from current LLM focus to multimodal models, Mixture-of-Experts architectures, and emerging AI application domains

Skills

Required

  • 3+ years of engineering team management experience
  • 7+ years of working directly within engineering teams experience
  • 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
  • Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations
  • Experience partnering with product or program management teams

Nice to have

  • Experience in communicating with users, other technical teams, and senior leadership to collect requirements, describe software product features, technical designs, and product strategy
  • Experience in recruiting, hiring, mentoring/coaching and managing teams of Software Engineers to improve their skills, and make them more effective, product software engineers

What the JD emphasized

  • ML performance optimization
  • performance data
  • performance engineering
  • performance measurement infrastructure
  • performance best-known-configurations
  • performance regressions
  • performance engineering practices

Other signals

  • ML performance optimization
  • LLMs
  • multimodal models
  • AI applications
  • accelerated computing platforms
  • CUDA optimization
  • frameworks
  • serving layers
  • benchmarking infrastructure
  • CI/CD systems
  • Mixture-of-Experts architectures