Technical Marketing Engineer – AI Training Workloads & Performance

AMD · Semiconductors · Santa Clara, CA · Engineering

Technical Marketing Engineer focused on AI training workloads and performance optimization for AMD GPUs. The role involves creating technical content, analyzing performance bottlenecks, and enabling customers to achieve optimal results across various stages of the AI model lifecycle, from pre-training to fine-tuning and reinforcement learning.

What you'd actually do

  1. Partner with AMD’s AI software engineering team to develop performance-focused technical content for AI training workloads, including optimization guides, benchmarking results, scaling studies, and tuning methodologies.
  2. Serve as a subject matter expert on AI training performance across the full model lifecycle, including: Large-scale pre-training (foundation models), Fine-tuning and parameter-efficient methods (e.g., LoRA, PEFT), Reinforcement learning workflows (e.g., RLHF, RLAIF), Distillation and model compression techniques, Quantization-aware training (QAT)
  3. Develop and publish deep technical content for training workloads, including: Performance analysis and bottleneck breakdowns, Scaling studies (single-node and multi-node), Optimization guides for both pre-training and post-training workflows, Distributed training best practices (data/model/pipeline parallelism), Workload-specific tuning strategies and competitive positioning insights
  4. Analyze and optimize training performance across key system dimensions, including compute utilization, memory efficiency, communication overhead, and scaling behavior in distributed environments.
  5. Engage with internal and external experts to validate performance claims against real-world scenarios and large-scale training runs.

Skills

Required

  • Python
  • C/C++
  • ROCm or CUDA
  • distributed training frameworks (PyTorch, DeepSpeed, Megatron-LM)
  • technical content creation
  • performance analysis
  • benchmarking
  • customer enablement

Nice to have

  • experience with ISVs, hyperscalers, or large-scale AI deployments
  • solution validation
  • certification frameworks
  • Markdown
  • Read the Docs
  • Jupyter Notebooks

What the JD emphasized

  • AI training workloads
  • performance optimization
  • distributed training
  • large-scale pre-training
  • fine-tuning
  • reinforcement learning workflows
  • scaling studies
  • performance analysis
  • bottleneck breakdowns

Other signals

  • AI training workloads
  • performance optimization
  • distributed training
  • technical content creation
  • customer enablement