Principle AI Software Engineer

AMD AMD · Semiconductors · Chengdu, China · Engineering

Seeking a Principal AI Software Engineer to lead the design and development of next-generation AI inference systems, intelligent model routing, and cloud-native deployment technologies for AMD Instinct GPUs. This role involves working with LLM serving, semantic routing, Kubernetes, Envoy, AI gateways, and open-source infrastructure to enable high-performance, production-ready AI software.

What you'd actually do

  1. Lead the design and development of intelligent routing technologies for LLM serving on AMD Instinct GPUs, including semantic routing, workload-aware routing, policy-based routing, and multi-model inference orchestration.
  2. Drive AMD enablement and optimization for vLLM Semantic Router and related open-source AI gateway technologies, ensuring strong support for ROCm and AMD GPU platforms.
  3. Collaborate with AMD architecture, ROCm, kernel, compiler, and AI framework teams to identify and optimize bottlenecks in LLM inference workloads.
  4. Develop production-quality software components for AI inference systems, including routers, gateways, control-plane services, observability tools, policy engines, and deployment automation.
  5. Build and optimize integrations across vLLM, Kubernetes, Envoy, Gateway API, service mesh, and AI gateway ecosystems.

Skills

Required

  • cloud-native infrastructure
  • open-source development
  • AI inference systems
  • Kubernetes
  • Envoy
  • AI gateways
  • vLLM
  • semantic routing
  • multi-model serving
  • policy-driven routing
  • semantic caching
  • observability
  • privacy-aware routing
  • workload-aware optimization
  • Go
  • Rust
  • Python
  • C/C++

Nice to have

  • AMD Instinct GPUs
  • ROCm
  • Gateway API
  • service mesh
  • ingress/gateway controllers
  • CNCF
  • SGLang
  • TensorRT-LLM
  • prompt classification
  • tool routing
  • agent routing
  • Linux systems
  • containerized deployments
  • distributed debugging
  • production reliability
  • CUDA
  • ONNX Runtime
  • PyTorch
  • performance profiling
  • benchmarking
  • latency optimization
  • memory optimization
  • high-concurrency serving systems
  • technical writing
  • public speaking
  • community engagement
  • cross-functional collaboration

What the JD emphasized

  • build scalable systems from 0 to 1
  • driving open-source communities
  • solving complex performance, reliability, and deployment challenges
  • translate emerging AI infrastructure trends into practical software solutions

Other signals

  • LLM serving
  • intelligent routing
  • cloud-native deployment
  • AMD Instinct GPUs
  • ROCm software stack