Lead Ai/ml Software Use Cases Validation Engineer

AMD AMD · Semiconductors · Bangalore, India · Engineering

Lead AI/ML Software Use Cases Validation Engineer at AMD in Bangalore, India. This role focuses on validating AI/ML compute stacks and Physical AI SDKs, leading end-to-end AI pipeline validation, benchmarking, and performance optimization on Linux platforms. The engineer will work with various teams to deliver production-quality AI solutions aligned with customer use cases, focusing on model training, conversion, optimization, kernel execution, inference accuracy, and system-level optimizations.

What you'd actually do

  1. Lead validation and quality ownership of AI/ML compute stacks on Ubuntu and Yocto
  2. Define validation strategy, test architecture, and coverage across functional, performance, stress, regression, and scalability testing
  3. Own and drive the defect lifecycle, including triage, root cause analysis, and closure
  4. Validate end‑to‑end AI pipelines, including: Model training, conversion, and optimization (e.g., PyTorch → ONNX), Kernel execution, memory transfers, and inference accuracy
  5. Define and execute AI benchmarking and profiling strategies for training and inference workloads

Skills

Required

  • 8–12 years of experience in AI/ML software validation or performance engineering
  • Strong expertise in Python scripting and test automation
  • Strong ML fundamentals including deep learning and LLMs
  • Hands-on experience with ROCm validation, performance profiling, and optimization
  • Experience with HIP, CUDA, OpenCL, and TensorFlow/PyTorch integrations
  • Proven experience validating end-to-end AI pipelines
  • Strong Linux expertise (Ubuntu, Yocto)

Nice to have

  • Robotics platforms
  • SIL/HIL environments
  • real-world deployments
  • heterogeneous accelerators

What the JD emphasized

  • production-quality AI solutions
  • end-to-end AI pipelines
  • AI benchmarking and profiling
  • system-level and model-level optimizations
  • ROCm validation
  • Python scripting and test automation
  • deep learning and LLMs
  • end-to-end AI pipelines
  • production readiness

Other signals

  • validation of AI/ML compute stacks
  • end-to-end AI pipeline validation
  • benchmarking and performance optimization
  • production-quality AI solutions