Software Development Manager, LLM Inference Model Enablement, Neuron Sdk

Amazon Amazon · Big Tech · Cupertino, CA · Software Development

Software Development Manager to lead a team optimizing LLMs for inference on AWS custom accelerators (Neuron, Trainium, Inferentia). Focus on improving model enablement speed, experience, usability, and quality through features, infrastructure, tools, and automation. Requires strong background in LLM architectures, performance optimizations, and distributed inference.

What you'd actually do

  1. lead a team of expert AI/ML engineers to onboard and optimize state-of-the-art open-source and customer LLMs, both dense and MoE, for inference on Neuron and Trainium and Inferentia accelerators
  2. drive improvements in model enablement speed and experience, while advancing inference usability and quality through inference features, infrastructure optimization, tools, and automation
  3. define the model enablement and performance optimization for the latest SOTA LLMs, build and deliver them to customers
  4. continue improving the model onboarding experience, as well as enhancing inference usability and quality for Neuron-supported models
  5. manage changing priorities as new models and new technologies emerge, and you adapt your team’s work to manage them

Skills

Required

  • 3+ years of engineering team management experience
  • 7+ years of working directly within engineering teams experience
  • 3+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
  • Experience partnering with product or program management teams

Nice to have

  • Experience in communicating with users, other technical teams, and senior leadership to collect requirements, describe software product features, technical designs, and product strategy
  • Experience in recruiting, hiring, mentoring/coaching and managing teams of Software Engineers to improve their skills, and make them more effective, product software engineers
  • strong background in LLM model architectures
  • model performance optimizations
  • inference techniques
  • delivering high-performance models using distributed inference libraries

What the JD emphasized

  • optimize LLMs
  • inference

Other signals

  • optimize LLMs
  • inference on Neuron and Trainium and Inferentia accelerators
  • model enablement speed and experience
  • advancing inference usability and quality