Lower-level systems work optimizing how trained models actually run on GPUs: scheduling, custom kernels, paged attention, speculative decoding.
Primary AI lifecycle stage: serving infrastructure.
As of today, 2,740 active AI roles across 208 companies in our index reference Inference infra. Hiring concentrates at the serving infrastructure (59%) and agents (25%) stages. Most common sectors: Big Tech, Semiconductors, Enterprise.
Lower-level systems work optimizing how trained models actually run on GPUs: scheduling, custom kernels, paged attention, speculative decoding. Primary AI lifecycle stage: serving infrastructure.
2,740 active AI roles across 208 companies in our index reference Inference infra as of today.
The companies with the most active Inference infra listings are: Amazon (334 roles), NVIDIA (326 roles), Google (176 roles), Capital One (107 roles), Microsoft (106 roles).
Inference infra primarily belongs to the serving infrastructure stage of the AI lifecycle. In current hiring, Inference infra roles concentrate at: serving infrastructure (59%), agents (25%).
The sectors with the most active Inference infra hiring are: Big Tech, Semiconductors, Enterprise.
716 AI roles tagged inference_infra.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| NVIDIA | Research Scientist, Generalist Embodied Agent Research - PhD New College Grad 2026 | Semiconductors | 10 | Embodied AI · Agent research · Multimodal · RL robotics · Model serving · Synthetic data |
| NVIDIA | Senior Systems Software Engineer, AI Stack and Performance - DGX Station | Semiconductors | 9 | Model serving · Agent orchestration |
| AMD | Fellow, AI Workload Optimization | Semiconductors | 9 | Model serving · Fine-tuning · Quantization · Multimodal · LLM observability |
| NVIDIA | Senior Machine Learning Engineer, Perception - Autonomous Driving | Semiconductors | 9 | Vision · Model serving |
| NVIDIA | Senior Software Engineer, DGX Cloud AI Infrastructure | Semiconductors | 9 | Model serving |
| NVIDIA | Software Engineer, DGX Cloud AI Infrastructure | Semiconductors | 9 | Model serving |
| NVIDIA | Senior Deep Learning Performance Architect | Semiconductors | 9 | Model serving |
| NVIDIA | AI Inference Performance Engineer - New College Grad 2026 | Semiconductors | 9 | Model serving · Quantization · Vision · Audio & speech |
| NVIDIA | Deep Learning Performance Software Engineer | Semiconductors | 9 | Model serving |
| AMD | Senior Software Development Engineer – LLM Inference Framework | Semiconductors | 9 | Model serving |
| NVIDIA | Senior Data Scientist - Security and Networking Research | Semiconductors | 9 | Agent orchestration · RAG · Tool use · Evals · Fine-tuning · Model serving · Multimodal · Synthetic data |
| NVIDIA | AI Computing Architect | Semiconductors | 9 | Model serving |
| AMD | 多模态算法工程师(模型优化方向)/ Multimodal Algorithm Engineer (Model Optimization) | Semiconductors | 9 | Multimodal · Embodied AI · Agent orchestration · Fine-tuning · Model serving · Quantization |
| NVIDIA | AI Workload and Networking Research Architect | Semiconductors | 9 | Model serving · Frontier research · Multimodal · LLM observability |
| NVIDIA | Senior High Performance AI Engineer | Semiconductors | 9 | Agent orchestration · Tool use · Code gen · Model serving · Agent research |
| AMD | Principal AI Performance Modeling Architect | Semiconductors | 9 | Model serving · Fine-tuning · Multimodal · Vision · Audio & speech |
| NVIDIA | Senior Quantum AI Research Scientist, Applied Research | Semiconductors | 9 | Frontier research · Agent research · RL post-training · Fine-tuning · Model serving |
| AMD | Senior Agentic System & Application Engineer | Semiconductors | 9 | Agent orchestration · Tool use · LLM observability · Model serving · Evals · Code gen |
| NVIDIA | Senior LLM Agents Architect | Semiconductors | 9 | Agent orchestration · Tool use · Evals · LLM observability · RAG · Model serving · Code gen |
| NVIDIA | Software Engineer, AI and DL Kernel Libraries - New College Grad 2026 | Semiconductors | 9 | Model serving |
| NVIDIA | Senior Performance Architect, Nemotron | Semiconductors | 9 | Model serving |
| Intel | Neuromorphic Applications Researcher- Temporary Position | Semiconductors | 9 | Embodied AI · Model serving |
| NVIDIA | Senior Research Scientist, Post-Training LLM and DLM | Semiconductors | 9 | Fine-tuning · RL post-training · Model serving · Evals |
| NVIDIA | Senior Software Engineer, Agentic AI – Nvidia Blueprints and NIM Integrations | Semiconductors | 9 | Agent orchestration · Tool use · Evals · RAG · Model serving |
| NVIDIA | Senior DL Algorithms Engineer - Inference Performance | Semiconductors | 9 | Model serving |
| NVIDIA | Senior Deep Learning Software Engineer, Inference | Semiconductors | 9 | Model serving · Multimodal |
| NVIDIA | Senior GenAI Technical Lead, Partner Platforms | Semiconductors | 9 | Agent orchestration · Tool use · RAG · Vector DB · Model serving · Multimodal |
| NVIDIA | Senior Solutions Architect, Autonomous Driving - GenAI | Semiconductors | 9 | Agent orchestration · Synthetic data · Model serving · Vision · Multimodal |
| NVIDIA | Machine Learning Applications and Compiler Engineer, LPX - New College Grad 2026 | Semiconductors | 9 | Model serving |
| NVIDIA | Senior AI Engineer, World Foundation Models | Semiconductors | 9 | Vision · Multimodal · Fine-tuning · Model serving · Evals · Frontier research |