Currently tracking 427 active AI roles, up 208% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $65k–$331k (avg $193k).
| Title | Stage | AI score |
|---|---|---|
| Principal Software Engineer Principal Software Engineer to advance ad-serving infrastructure, focusing on performance, efficiency, and scalability of next-gen model serving and inference platforms for Ads. Designs and optimizes high-performance serving systems and GPU inference frameworks for deep learning and LLM workloads. | Serve | 7 |
| Senior Principal Engineering Manager Lead and grow a team building and operating world-class research compute infrastructure, including large-scale GPU clusters and agentic development tools, for Microsoft Research globally. | Serve | 7 |
| Senior Software Engineer--Backend--Microsoft Copilot Senior Software Engineer for Microsoft Copilot's backend platform, focusing on scaling AI services, integrating AI models, and providing tools for engineers. Requires strong backend and cloud infrastructure experience. |
| Serve |
| 7 |
| Principal Product Manager/Architect - Foundry Inference Platform (CoreAI) The Principal Product Manager/Architect will define and guide the technical architecture of Microsoft Foundry, an AI inferencing platform focused on reliability, scalability, and efficiency for large-scale GPU fleets. The role involves setting product direction for reliability, GPU fleet efficiency, capacity management, and engaging with strategic customers. Success metrics include platform reliability, GPU utilization, and customer outcomes. | Serve | 7 |
| Senior Software Engineer - CoreAI Model Inference & Serving Senior Software Engineer role focused on building and scaling the AI data-plane for LLM inferencing across Microsoft and Azure. The role involves designing, coding, and shipping core serving systems, smart routing, and request distribution for a wide range of LLMs, aiming for reliability, efficiency, and ultra-low latency. | Serve | 7 |
| Principal Software Engineering--Backend--Microsoft Copilot This role focuses on building and scaling the backend platform for Microsoft Copilot, integrating with AI models and empowering Copilot teams. The engineer will design, develop, and maintain performant and secure AI Platform services, ensuring reliability, scalability, and performance. The role requires experience with public cloud infrastructure, containerization, and production software development. | Serve | 7 |
| MTS - Platform Engineering Manager This role is for a Platform Engineering Manager at Microsoft AI, focusing on building and scaling the AI platform services that power Copilot. The role involves managing a team to develop secure, performant APIs for finetuning and deploying core AI experiences, collaborating with various teams, and ensuring high-quality code delivery in a fast-paced consumer-facing environment. | Serve | 7 |
| Principal Software Engineer, CoreAI This role focuses on building and operating the foundational GPU accelerated infrastructure for large-scale AI training and inference across Azure. It involves designing systems for GPU management, scheduling, isolation, and sharing, as well as optimizing performance, reliability, and utilization of GPU fleets. The role also requires driving end-to-end platform features, including observability and diagnostics, and influencing platform architecture. | Serve | 7 |
| Senior Software Engineer--Infra-Microsoft Copilot The role focuses on building and scaling the backend platform for Microsoft Copilot, including integrations with AI models and tools for engineering teams. The engineer will design, develop, and maintain performant and secure AI Platform services, ensuring reliability, scalability, and performance. This involves working with public cloud infrastructure, containerization technologies, and production software release. | Serve | 7 |
| Principal Product Manager Principal Product Manager for Azure AI Foundry and Azure ML, shaping strategy for AI/ML and GenAI platforms including training, deployment, monitoring, and governance. Focuses on developer-centric AI platforms enabling organizations to build, deploy, and operate AI systems at scale. | ServePost-train | 7 |
| Product Manager II - Foundry Model Inference (CoreAI) Product Manager II for Microsoft Foundry, focusing on the AI-first application stack, model serving platform, and generative AI development. The role involves defining product offerings, identifying quality improvement opportunities, tracking metrics, and collaborating with engineering and go-to-market teams to deliver integrated solutions for customers, including those in highly regulated industries. | Serve | 7 |
| Principal Researcher - Systems & Networking - Microsoft Research Principal Researcher in Systems and Networking with a focus on AI-driven methods for systems innovation, performance, efficiency, and scalability. The role involves developing new methodologies, collaborating with cross-functional teams, and publishing research findings. | Serve | 7 |
| Principal Researcher - Cloud and AI Infrastructure - Microsoft Research Principal Researcher at Microsoft Research focused on advancing cloud and AI infrastructure architecture, and chip design using AI technologies. The role involves investigating hardware trends, designing and optimizing hardware components, conducting simulations, developing prototypes, and collaborating with cross-functional teams to integrate intelligent systems across computing layers. | Serve | 7 |
| Firmware Engineer Firmware Engineer role focused on designing, developing, and debugging firmware for Azure's custom AI accelerator silicon. This involves working across silicon, hardware, and software teams to enable advanced AI workloads and support data center deployment. | Serve | 7 |
| Member of Technical Staff, Site Reliability Engineer (HPC) - MAI SuperIntelligence Team The role is for a Site Reliability Engineer (SRE) focused on High Performance Computing (HPC) infrastructure for AI model training and inference. The engineer will ensure the reliability, availability, and efficiency of large-scale distributed AI systems, including GPU clusters, and will be involved in monitoring, automation, incident management, and security. | Serve | 7 |
| Member of Technical Staff, HPC Operations Engineering Manager This role manages a team of Site Reliability Engineers responsible for the reliability and efficiency of large-scale distributed AI infrastructure, specifically for training, fine-tuning, and serving generative AI models. The focus is on leading operations, observability, automation, incident management, and security within hybrid cloud/on-prem CPU+GPU environments, collaborating closely with ML engineers and platform teams. | ServePost-train | 7 |
| Software Engineer Software Engineer role focused on building and scaling the inferencing cloud for Large Language Models and GenAI Services within Azure CoreAI Platform. The role involves designing, building, and operating large-scale engineering systems for AI models. | Serve | 7 |
| Senior Software Engineer Senior Software Engineer role focused on designing, developing, and optimizing Azure's High Performance Computing and AI Platform (HPC/AI) virtual machines. This involves deep technical work on hardware/software interactions, device virtualization, and performance analysis of GPU workloads for large-scale AI training and inference. The role contributes to the underlying platform software and its exposure as an Azure service, with opportunities to work on upper layers of Azure infrastructure. | Serve | 7 |
| Research Intern - AI Systems and Tools Research Intern role focused on developing AI systems and tools, particularly developer tools for Microsoft's custom Maia AI hardware. This involves working on profilers, debuggers, performance analysis tools, and simulators to enable efficient execution of AI models on AI accelerators. The role collaborates with AI researchers, hardware teams, and AI compilers teams, and involves work on device firmware, host software, and integration with AI/ML frameworks. | Serve | 7 |
| Senior Software Engineer This role focuses on designing and developing next-generation networking infrastructure for large-scale AI training and inference in Azure Cloud. The engineer will work on high-performance, low-latency, and low-jitter communication frameworks, optimizing scalability and reliability for distributed AI workloads. | Serve | 7 |
| Principal Software Engineer Principal Software Engineer role focused on designing, developing, and optimizing networking infrastructure for large-scale AI training and inference in Azure Cloud. The role emphasizes high performance, low latency, and reliability for distributed AI workloads, working with AI accelerators and advanced networking technologies. | Serve | 7 |
| Senior Software Engineer The role focuses on designing and building cutting-edge networking infrastructure for large-scale AI training and inference in Azure Cloud. The goal is to enable breakthroughs in AI by delivering unmatched computational power, scalability, and reliability, with a focus on high performance, low latency, and minimal jitter for distributed AI workloads. | Serve | 7 |
| Member of Technical Staff - Software Engineer (SuperIntelligence team) This role focuses on building and operating the core platform infrastructure for training, evaluating, and deploying large-scale AI models within Microsoft. It involves designing scalable services for cluster orchestration, job scheduling, data pipelines, and artifact management, with a strong emphasis on production operations, cloud platforms (Azure), and enhancing developer experience for AI research and engineering teams. | Serve | 7 |
| Member of Technical Staff, Hardware Health - MAI Superintelligence Team This role is focused on ensuring the reliability, performance, and availability of Microsoft's large-scale AI training infrastructures, which involve tens of thousands of GPUs and advanced networking. The responsibilities include designing transport, fabric architecture, telemetry, observability, and automated troubleshooting for these clusters. The role also involves AI training and inference cluster bring-up, performance benchmarking, and root-cause analysis, with a goal of developing predictive health models and autonomous remediation systems. | Serve | 7 |
| Research Intern - AI System Architecture Modeling and Performance Research Intern role focused on AI system architecture modeling and performance within Azure's hyperscale infrastructure. The intern will evaluate hardware/software co-design opportunities, optimize CPU, GPU, and networking infrastructure for AI accelerators, and develop methodologies for performance analysis and architectural idea evaluation. | Serve | 7 |
| Research Intern - AI Hardware Research Intern role focused on AI Hardware, specifically on chip architectures for efficient AI serving and inference systems. The role involves research, analysis, documentation, and innovation in collaboration with researchers and engineers. | Serve | 7 |
| Research Intern - AI Frameworks (Network Systems and Tools) Research intern focused on next-generation AI systems, specifically exploring disaggregated inference, memory-architecture, and interconnect technologies for LLM serving, with a focus on request scheduling and KV caching optimizations. The role involves investigating and evaluating disaggregated KV cache architectures and building a P2P service KV cache sharing architecture. | Serve | 7 |
| Technical Program Manager - Infrastructure Technical Program Manager for AI Infrastructure at Microsoft AI, focusing on building and optimizing platforms for large-scale foundation model training, deployment, and serving. The role involves coordinating projects, collaborating with researchers and engineers, and driving progress in a 0->1 environment. | ServePost-train | 7 |
| Senior Researcher - Systems and Networking, Microsoft Research Senior Researcher in Systems and Networking at Microsoft Research, focusing on AI-driven methods for system innovation, performance, efficiency, and scalability. The role involves developing and implementing new methodologies, collaborating with cross-functional teams, and publishing research findings. Requires a Doctorate and background in systems/networking with knowledge of ML systems, databases, and networking technologies, including Agent Systems and Vector Databases. | Serve | 7 |
| Research Intern - Hardware/Software Codesign Research intern focused on advancing the efficiency of AI systems through hardware/software codesign, exploring novel designs and optimizations across the AI stack, including models, frameworks, cloud infrastructure, and hardware. The role involves practical implementation skills for efficient, scalable computational kernels and aims to contribute to mid- and long-term product innovations. | Serve | 7 |
| Research Intern - AI Hardware Research Intern role focused on AI Hardware, specifically researching chip architectures for efficient AI serving and inference systems. Collaborates with researchers to increase performance and efficiency of cutting-edge inference systems. | Serve | 7 |
| Research Intern - Azure Research - Systems Research Intern role focused on next-generation cloud and AI systems, with a focus on improving efficiency, reliability, and usability of Microsoft's online services and datacenters. Projects include efficient GPU/LLM deployments, AIOps, and serverless computing. The role involves research, prototyping, evaluation, and potential publication. | Serve | 7 |
| Research Intern - MSR Software-Hardware Co-design Research intern role focused on pioneering technologies for AI/ML workloads, specifically improving efficiency, security, and robustness of GPU memory systems, agentic AI systems, and software architecture for hardware accelerators. The role involves fast-paced execution, implementation, and evaluation on Azure platforms, with a focus on systems and AI. | Serve | 7 |