Currently tracking 427 active AI roles, up 208% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $65k–$331k (avg $193k).
| Title | Stage | AI score |
|---|---|---|
| Member of Technical Staff, Software Co-Design AI HPC Systems - MAI Superintelligence Team This role focuses on the co-design and productionization of next-generation AI systems at datacenter scale, optimizing performance, efficiency, and cost across hardware and software. It involves analyzing workloads, driving architectural decisions, optimizing distributed systems for training and inference, and influencing AI hardware design. The role also includes performance modeling, prototyping, and mentoring. | ServePretrain | 9 |
| Member of Technical Staff, AI Systems Engineer - Microsoft Superintelligence The role focuses on integrating custom AI silicon with AI inference frameworks like SGLang, optimizing LLM inference performance, and developing custom operators. It involves working with hardware accelerators and potentially non-CUDA ecosystems, aiming to improve AI workload efficiency. |
| Serve |
| 8 |
| Member of Technical Staff, AI Networking - MAI Superintelligence Team This role focuses on designing, scaling, and optimizing high-performance networks for AI training and inference clusters. The engineer will work on the end-to-end networking architecture, from link-layer to fabric-wide systems, connecting thousands of GPUs. Responsibilities include benchmarking, profiling, debugging, and tuning AI workloads, engineering ultra-low-latency networks, and designing congestion-free transport mechanisms. The goal is to build networking systems that directly accelerate Microsoft's frontier AI models and support the development of advanced AI systems. | Serve | 8 |
| Member of Technical Staff, Hardware Health - MAI Superintelligence Team This role is focused on ensuring the reliability, performance, and availability of Microsoft's large-scale AI training infrastructures, which involve tens of thousands of GPUs and advanced networking. The responsibilities include designing transport, fabric architecture, telemetry, observability, and automated troubleshooting for these clusters. The role also involves AI training and inference cluster bring-up, performance benchmarking, and root-cause analysis, with a goal of developing predictive health models and autonomous remediation systems. | Serve | 7 |