What you'd actually do

Lead the design and development of intelligent routing technologies for LLM serving on AMD Instinct GPUs, including semantic routing, workload-aware routing, policy-based routing, and multi-model inference orchestration.

Drive AMD enablement and optimization for vLLM Semantic Router and related open-source AI gateway technologies, ensuring strong support for ROCm and AMD GPU platforms.

Collaborate with AMD architecture, ROCm, kernel, compiler, and AI framework teams to identify and optimize bottlenecks in LLM inference workloads.

Develop production-quality software components for AI inference systems, including routers, gateways, control-plane services, observability tools, policy engines, and deployment automation.

Build and optimize integrations across vLLM, Kubernetes, Envoy, Gateway API, service mesh, and AI gateway ecosystems.

Skills

Required

cloud-native infrastructure
open-source development
AI inference systems
Kubernetes
Envoy
AI gateways
vLLM
semantic routing
multi-model serving
policy-driven routing
semantic caching
observability
privacy-aware routing
workload-aware optimization
Go
Rust
Python
C/C++

Nice to have

AMD Instinct GPUs
ROCm
Gateway API
service mesh
ingress/gateway controllers
CNCF
SGLang
TensorRT-LLM
prompt classification
tool routing
agent routing
Linux systems
containerized deployments
distributed debugging
production reliability
CUDA
ONNX Runtime
PyTorch
performance profiling
benchmarking
latency optimization
memory optimization
high-concurrency serving systems
technical writing
public speaking
community engagement
cross-functional collaboration

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. **Together, we advance your career. **

THE ROLE:

AMD is looking for a strategic software engineering lead to drive next-generation AI inference systems, intelligent model routing, and cloud-native deployment technologies for AMD Instinct GPUs. In this role, you will work at the intersection of LLM serving, semantic routing, Kubernetes, Envoy, AI gateways, and open-source infrastructure. You will be a member of a core team of talented industry specialists focused on enabling high-performance, production-ready AI software on the latest AMD hardware and ROCm software stack.

This role is especially focused on advancing intelligent routing and system-level optimization for LLM inference, including vLLM, vLLM Semantic Router, multi-model serving, policy-driven routing, semantic caching, observability, privacy-aware routing, and workload-aware optimization across AMD GPU platforms.

THE PERSON:

The ideal candidate is a hands-on technical leader with deep expertise in cloud-native infrastructure, open-source development, and AI inference systems. The candidate should be passionate about building scalable systems from 0 to 1, driving open-source communities, and solving complex performance, reliability, and deployment challenges.

The candidate should be comfortable working across engineering, architecture, product, partner, and open-source communities. Strong communication, technical writing, public speaking, and community leadership skills are essential. The successful candidate will be able to translate emerging AI infrastructure trends into practical software solutions that strengthen AMD’s position in the open-source AI ecosystem.

KEY RESPONSIBILITIES:

Lead the design and development of intelligent routing technologies for LLM serving on AMD Instinct GPUs, including semantic routing, workload-aware routing, policy-based routing, and multi-model inference orchestration.
Drive AMD enablement and optimization for vLLM Semantic Router and related open-source AI gateway technologies, ensuring strong support for ROCm and AMD GPU platforms.
Collaborate with AMD architecture, ROCm, kernel, compiler, and AI framework teams to identify and optimize bottlenecks in LLM inference workloads.
Develop production-quality software components for AI inference systems, including routers, gateways, control-plane services, observability tools, policy engines, and deployment automation.
Build and optimize integrations across vLLM, Kubernetes, Envoy, Gateway API, service mesh, and AI gateway ecosystems.
Apply a data-driven approach to performance analysis, including benchmarking, profiling, latency analysis, throughput optimization, and cost-efficiency evaluation.
Contribute to open-source communities and represent AMD in key AI infrastructure projects, including vLLM, Kubernetes, Envoy, Gateway API, and related CNCF ecosystems.
Develop technical relationships with external partners, customers, researchers, and community maintainers to accelerate AMD adoption in AI inference workloads.
Create technical documentation, blogs, demos, reference architectures, and conference presentations to showcase AMD’s AI software capabilities.
Participate in new AMD GPU platform bring-up activities by validating AI inference software stacks, debugging system-level issues, and developing early proof points for emerging workloads.
Research and prototype new approaches for system intelligence in LLM serving, including semantic caching, prompt classification, privacy-aware routing, safety-aware routing, tool routing, agent routing, and workload-router-pool architectures.

PREFERRED EXPERIENCE:

Strong software engineering background with experience building distributed systems, cloud-native infrastructure, AI infrastructure, or high-performance serving systems.
Deep experience with Kubernetes, Envoy, Gateway API, service mesh, ingress/gateway controllers, or cloud-native networking.
Experience contributing to or maintaining major open-source projects, preferably in CNCF, Kubernetes, Envoy, Istio, vLLM, or AI infrastructure communities.
Experience with LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or related inference-serving systems.
Experience with semantic routing, AI gateways, model routing, policy-based routing, semantic caching, prompt classification, or multi-model inference orchestration.
Strong programming skills in Go, Rust, Python, and/or C/C++. Go and Rust experience are especially valuable for cloud-native control planes, gateways, and high-performance routing systems.
Experience with Linux systems, containerized deployments, distributed debugging, observability, and production reliability.
Familiarity with GPU-accelerated AI workloads, ROCm, CUDA, ONNX Runtime, PyTorch, or inference performance optimization is a strong plus.
Experience with performance profiling, benchmarking, latency optimization, memory optimization, and high-concurrency serving systems.
Ability to write high-quality, maintainable code with strong attention to architecture, reliability, testing, and operational simplicity.
Strong technical communication skills, including technical writing, public speaking, community engagement, and cross-functional collaboration.
Demonstrated ability to lead complex technical projects from concept to production and influence without direct authority across organizations and open-source communities.
Motivating technical leader with excellent interpersonal skills and the ability to work effectively in global, distributed teams.

ACADEMIC CREDENTIALS:

Bachelor’s or Master’s degree in Computer Science, Software Engineering, Computer Engineering, Electrical Engineering, or equivalent experience.
Advanced degree or research experience in AI systems, distributed systems, cloud-native infrastructure, machine learning systems, or high-performance computing is a plus.

#LI-JW2

_Benefits offered are described: _AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

_ _

This posting is for an existing vacancy.