Principal Pmt-es - Ai/ml Training, Annapurna Labs

Amazon Amazon · Big Tech · Cupertino, CA · Project/Program/Product Management--Technical

Principal Technical Product Manager to define and drive product strategy for training software on AWS Trainium, including distributed training libraries, post-training workflows (RLHF, DPO, fine-tuning), reinforcement learning frameworks, and training performance optimization. The role focuses on enabling researchers and operators to train frontier models at scale.

What you'd actually do

  1. Define and execute training product strategy and roadmap working backwards from customer requirements in collaboration with engineering leadership.
  2. Drive strategy for post-training workflows including RLHF, DPO, reward modeling, and fine-tuning at scale.
  3. Work with BD, Solutions Architecture, and GTM teams to engage customers training frontier models on Trainium.
  4. Define how Neuron supports the training AI/ML ecosystem and what tools customers will use for their training workflows on Trainium.
  5. Lead end-to-end launches for training capabilities, coordinating documentation, field enablement, and customer communications.

Skills

Required

  • Product strategy and roadmap definition
  • Customer requirements analysis
  • Engineering leadership collaboration
  • Distributed training systems
  • Model parallelism strategies
  • Training performance optimization
  • AI/ML training architectures
  • Post-training workflows (RLHF, DPO, fine-tuning)
  • Reinforcement learning frameworks
  • Customer engagement and enablement
  • AI/ML ecosystem understanding
  • Technical product management
  • Written and verbal communication

Nice to have

  • Experience with AWS Trainium
  • Experience with AWS Neuron
  • Experience with compiler, runtime, NKI, and infrastructure PMs
  • Open source community engagement
  • GTM strategy

What the JD emphasized

  • frontier models
  • distributed training
  • post-training workflows
  • RLHF
  • fine-tuning
  • reinforcement learning
  • performance optimization
  • AI/ML ecosystem
  • training software

Other signals

  • AWS Trainium
  • AWS Neuron
  • frontier models
  • distributed training
  • post-training workflows
  • RLHF
  • fine-tuning
  • reinforcement learning
  • performance optimization
  • AI/ML ecosystem