Principal Technical Product Manager- Accelerator Optimization

Microsoft Microsoft · Big Tech · Mountain View, CA +3 · Product Management

The Principal Product Manager will own the AMD inference platform optimization strategy for Azure AI services, focusing on performance, correctness, stability, compatibility, and release quality for AI models on AMD GPUs. This role involves defining product strategy, roadmaps, and execution, partnering with engineering and external vendors to ensure AMD is a first-class production platform for AI inference.

What you'd actually do

  1. Work with customers and AMD Partners to define and drive the product strategy, roadmap, and execution for AI model optimization on AMD GPU platforms, including current and future hardware generations.
  2. Partner closely with engineering teams across kernels, runtime, compiler, performance, quality, and service deployment to prioritize work across performance optimization, correctness, compatibility, stability, bug resolution, and release quality.
  3. Own AMD-specific technical product decisions across the inference stack, including platform tradeoffs involving throughput, latency, scalability, memory behavior, quality, reliability, cost efficiency, and time to delivery.
  4. Ensuring AMD implementations meet production standards for performance and operational quality. Own and drive improvements to KPIs related to throughput, latency, and hardware efficiency (e.g., GPU utilization, capacity savings), and track impact using standardized performance and cost metrics.
  5. Drive release readiness across the AMD release lifecycle, including planning, validation, regression prevention, deployment readiness, and production handoff.

Skills

Required

  • Product/service/program management or software development
  • AI/ML infrastructure
  • GPU systems
  • distributed systems
  • model serving
  • compilers
  • runtime systems
  • performance engineering
  • cloud platform architecture
  • working with external ecosystem partners, silicon vendors, or platform partners on technical product delivery
  • improving release readiness, CI quality, benchmarking, regression detection, or service reliability processes for technical products or platforms

Nice to have

  • Familiarity with AMD and/or NVIDIA GPU ecosystems, model execution frameworks, compiler/runtime stacks, or hardware/software co-optimization
  • reading and/or writing code

What the JD emphasized

  • AMD GPU platforms
  • performance
  • correctness
  • stability
  • compatibility
  • release quality
  • inference platform
  • AMD GPU
  • AMD implementations
  • production standards
  • performance
  • operational quality
  • throughput
  • latency
  • hardware efficiency
  • AMD release lifecycle
  • AMD engineering
  • AMD-specific production issues
  • AMD releases

Other signals

  • AI model optimization
  • AMD GPU platforms
  • Azure AI services
  • performance, correctness, stability, compatibility, and release quality
  • inference platform