Currently tracking 427 active AI roles, up 208% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $65k–$331k (avg $193k).
| Title | Stage | AI score |
|---|---|---|
| Principal Software Engineer Principal Software Engineer to advance ad-serving infrastructure, focusing on performance, efficiency, and scalability of next-gen model serving and inference platforms for Ads. Designs and optimizes high-performance serving systems and GPU inference frameworks for deep learning and LLM workloads. | Serve | 7 |
| Senior Principal Engineering Manager Lead and grow a team building and operating world-class research compute infrastructure, including large-scale GPU clusters and agentic development tools, for Microsoft Research globally. | Serve | 7 |
| Senior Software Engineer--Backend--Microsoft Copilot Senior Software Engineer for Microsoft Copilot's backend platform, focusing on scaling AI services, integrating AI models, and providing tools for engineers. Requires strong backend and cloud infrastructure experience. |
| Serve |
| 7 |
| Principal Product Manager/Architect - Foundry Inference Platform (CoreAI) The Principal Product Manager/Architect will define and guide the technical architecture of Microsoft Foundry, an AI inferencing platform focused on reliability, scalability, and efficiency for large-scale GPU fleets. The role involves setting product direction for reliability, GPU fleet efficiency, capacity management, and engaging with strategic customers. Success metrics include platform reliability, GPU utilization, and customer outcomes. | Serve | 7 |
| Senior Software Engineer - CoreAI Model Inference & Serving Senior Software Engineer role focused on building and scaling the AI data-plane for LLM inferencing across Microsoft and Azure. The role involves designing, coding, and shipping core serving systems, smart routing, and request distribution for a wide range of LLMs, aiming for reliability, efficiency, and ultra-low latency. | Serve | 7 |
| Principal Software Engineering--Backend--Microsoft Copilot This role focuses on building and scaling the backend platform for Microsoft Copilot, integrating with AI models and empowering Copilot teams. The engineer will design, develop, and maintain performant and secure AI Platform services, ensuring reliability, scalability, and performance. The role requires experience with public cloud infrastructure, containerization, and production software development. | Serve | 7 |
| MTS - Platform Engineering Manager This role is for a Platform Engineering Manager at Microsoft AI, focusing on building and scaling the AI platform services that power Copilot. The role involves managing a team to develop secure, performant APIs for finetuning and deploying core AI experiences, collaborating with various teams, and ensuring high-quality code delivery in a fast-paced consumer-facing environment. | Serve | 7 |
| Principal Software Engineer, CoreAI This role focuses on building and operating the foundational GPU accelerated infrastructure for large-scale AI training and inference across Azure. It involves designing systems for GPU management, scheduling, isolation, and sharing, as well as optimizing performance, reliability, and utilization of GPU fleets. The role also requires driving end-to-end platform features, including observability and diagnostics, and influencing platform architecture. | Serve | 7 |
| Senior Software Engineer--Infra-Microsoft Copilot The role focuses on building and scaling the backend platform for Microsoft Copilot, including integrations with AI models and tools for engineering teams. The engineer will design, develop, and maintain performant and secure AI Platform services, ensuring reliability, scalability, and performance. This involves working with public cloud infrastructure, containerization technologies, and production software release. | Serve | 7 |
| Principal Product Manager Principal Product Manager for Azure AI Foundry and Azure ML, shaping strategy for AI/ML and GenAI platforms including training, deployment, monitoring, and governance. Focuses on developer-centric AI platforms enabling organizations to build, deploy, and operate AI systems at scale. | ServePost-train | 7 |
| Product Manager II - Foundry Model Inference (CoreAI) Product Manager II for Microsoft Foundry, focusing on the AI-first application stack, model serving platform, and generative AI development. The role involves defining product offerings, identifying quality improvement opportunities, tracking metrics, and collaborating with engineering and go-to-market teams to deliver integrated solutions for customers, including those in highly regulated industries. | Serve | 7 |
| Principal Researcher - Systems & Networking - Microsoft Research Principal Researcher in Systems and Networking with a focus on AI-driven methods for systems innovation, performance, efficiency, and scalability. The role involves developing new methodologies, collaborating with cross-functional teams, and publishing research findings. | Serve | 7 |
| Principal Researcher - Cloud and AI Infrastructure - Microsoft Research Principal Researcher at Microsoft Research focused on advancing cloud and AI infrastructure architecture, and chip design using AI technologies. The role involves investigating hardware trends, designing and optimizing hardware components, conducting simulations, developing prototypes, and collaborating with cross-functional teams to integrate intelligent systems across computing layers. | Serve | 7 |
| Firmware Engineer Firmware Engineer role focused on designing, developing, and debugging firmware for Azure's custom AI accelerator silicon. This involves working across silicon, hardware, and software teams to enable advanced AI workloads and support data center deployment. | Serve | 7 |
| Member of Technical Staff, Site Reliability Engineer (HPC) - MAI SuperIntelligence Team The role is for a Site Reliability Engineer (SRE) focused on High Performance Computing (HPC) infrastructure for AI model training and inference. The engineer will ensure the reliability, availability, and efficiency of large-scale distributed AI systems, including GPU clusters, and will be involved in monitoring, automation, incident management, and security. | Serve | 7 |
| Member of Technical Staff, HPC Operations Engineering Manager This role manages a team of Site Reliability Engineers responsible for the reliability and efficiency of large-scale distributed AI infrastructure, specifically for training, fine-tuning, and serving generative AI models. The focus is on leading operations, observability, automation, incident management, and security within hybrid cloud/on-prem CPU+GPU environments, collaborating closely with ML engineers and platform teams. | ServePost-train | 7 |
| Software Engineer Software Engineer role focused on building and scaling the inferencing cloud for Large Language Models and GenAI Services within Azure CoreAI Platform. The role involves designing, building, and operating large-scale engineering systems for AI models. | Serve | 7 |
| Senior Software Engineer Senior Software Engineer role focused on designing, developing, and optimizing Azure's High Performance Computing and AI Platform (HPC/AI) virtual machines. This involves deep technical work on hardware/software interactions, device virtualization, and performance analysis of GPU workloads for large-scale AI training and inference. The role contributes to the underlying platform software and its exposure as an Azure service, with opportunities to work on upper layers of Azure infrastructure. | Serve | 7 |
| Research Intern - AI Systems and Tools Research Intern role focused on developing AI systems and tools, particularly developer tools for Microsoft's custom Maia AI hardware. This involves working on profilers, debuggers, performance analysis tools, and simulators to enable efficient execution of AI models on AI accelerators. The role collaborates with AI researchers, hardware teams, and AI compilers teams, and involves work on device firmware, host software, and integration with AI/ML frameworks. | Serve | 7 |
| Senior Software Engineer This role focuses on designing and developing next-generation networking infrastructure for large-scale AI training and inference in Azure Cloud. The engineer will work on high-performance, low-latency, and low-jitter communication frameworks, optimizing scalability and reliability for distributed AI workloads. | Serve | 7 |
| Principal Software Engineer Principal Software Engineer role focused on designing, developing, and optimizing networking infrastructure for large-scale AI training and inference in Azure Cloud. The role emphasizes high performance, low latency, and reliability for distributed AI workloads, working with AI accelerators and advanced networking technologies. | Serve | 7 |
| Senior Software Engineer The role focuses on designing and building cutting-edge networking infrastructure for large-scale AI training and inference in Azure Cloud. The goal is to enable breakthroughs in AI by delivering unmatched computational power, scalability, and reliability, with a focus on high performance, low latency, and minimal jitter for distributed AI workloads. | Serve | 7 |
| Member of Technical Staff - Software Engineer (SuperIntelligence team) This role focuses on building and operating the core platform infrastructure for training, evaluating, and deploying large-scale AI models within Microsoft. It involves designing scalable services for cluster orchestration, job scheduling, data pipelines, and artifact management, with a strong emphasis on production operations, cloud platforms (Azure), and enhancing developer experience for AI research and engineering teams. | Serve | 7 |
| Member of Technical Staff, Hardware Health - MAI Superintelligence Team This role is focused on ensuring the reliability, performance, and availability of Microsoft's large-scale AI training infrastructures, which involve tens of thousands of GPUs and advanced networking. The responsibilities include designing transport, fabric architecture, telemetry, observability, and automated troubleshooting for these clusters. The role also involves AI training and inference cluster bring-up, performance benchmarking, and root-cause analysis, with a goal of developing predictive health models and autonomous remediation systems. | Serve | 7 |
| Research Intern - AI System Architecture Modeling and Performance Research Intern role focused on AI system architecture modeling and performance within Azure's hyperscale infrastructure. The intern will evaluate hardware/software co-design opportunities, optimize CPU, GPU, and networking infrastructure for AI accelerators, and develop methodologies for performance analysis and architectural idea evaluation. | Serve | 7 |
| Research Intern - AI Hardware Research Intern role focused on AI Hardware, specifically on chip architectures for efficient AI serving and inference systems. The role involves research, analysis, documentation, and innovation in collaboration with researchers and engineers. | Serve | 7 |
| Research Intern - AI Frameworks (Network Systems and Tools) Research intern focused on next-generation AI systems, specifically exploring disaggregated inference, memory-architecture, and interconnect technologies for LLM serving, with a focus on request scheduling and KV caching optimizations. The role involves investigating and evaluating disaggregated KV cache architectures and building a P2P service KV cache sharing architecture. | Serve | 7 |
| Technical Program Manager - Infrastructure Technical Program Manager for AI Infrastructure at Microsoft AI, focusing on building and optimizing platforms for large-scale foundation model training, deployment, and serving. The role involves coordinating projects, collaborating with researchers and engineers, and driving progress in a 0->1 environment. | ServePost-train | 7 |
| Senior Researcher - Systems and Networking, Microsoft Research Senior Researcher in Systems and Networking at Microsoft Research, focusing on AI-driven methods for system innovation, performance, efficiency, and scalability. The role involves developing and implementing new methodologies, collaborating with cross-functional teams, and publishing research findings. Requires a Doctorate and background in systems/networking with knowledge of ML systems, databases, and networking technologies, including Agent Systems and Vector Databases. | Serve | 7 |
| Research Intern - Hardware/Software Codesign Research intern focused on advancing the efficiency of AI systems through hardware/software codesign, exploring novel designs and optimizations across the AI stack, including models, frameworks, cloud infrastructure, and hardware. The role involves practical implementation skills for efficient, scalable computational kernels and aims to contribute to mid- and long-term product innovations. | Serve | 7 |
| Research Intern - AI Hardware Research Intern role focused on AI Hardware, specifically researching chip architectures for efficient AI serving and inference systems. Collaborates with researchers to increase performance and efficiency of cutting-edge inference systems. | Serve | 7 |
| Research Intern - Azure Research - Systems Research Intern role focused on next-generation cloud and AI systems, with a focus on improving efficiency, reliability, and usability of Microsoft's online services and datacenters. Projects include efficient GPU/LLM deployments, AIOps, and serverless computing. The role involves research, prototyping, evaluation, and potential publication. | Serve | 7 |
| Research Intern - MSR Software-Hardware Co-design Research intern role focused on pioneering technologies for AI/ML workloads, specifically improving efficiency, security, and robustness of GPU memory systems, agentic AI systems, and software architecture for hardware accelerators. The role involves fast-paced execution, implementation, and evaluation on Azure platforms, with a focus on systems and AI. | Serve | 7 |
| Service Engineer II Service Engineer II role focused on building AI-driven automation and full-stack tools to improve the reliability, quality, and customer experience of the Microsoft Advertising platform. The role involves deep technical investigations, debugging distributed systems, writing production-quality code, and partnering with engineering teams. Experience with AI-based coding tools is required. | Serve | 5 |
| Consultant A2 - Infra This role focuses on designing, building, and optimizing end-to-end cloud and on-premises infrastructure solutions, with a significant emphasis on supporting AI/ML workloads. The consultant will leverage Azure AI Services, containerized AI workloads, and integrate models into cloud environments, acting as a full-stack infrastructure consultant. | Serve | 5 |
| Consultant - Cloud Infra & AI This role focuses on designing, building, and optimizing cloud and on-premises infrastructure solutions, with a specific emphasis on leveraging Azure AI Services as infrastructure components for applications. The consultant will integrate OpenAI/Frontier models into cloud environments using secure and scalable patterns, requiring proficiency in Python for automation and AI integration, and experience with AI developer tools. The role also involves cloud strategy, networking, DevOps, and security practices within Azure. | Serve | 5 |
| Sr Consultant - Infra Sr. Consultant focused on designing, building, and optimizing cloud and on-premises infrastructure solutions, with a specific emphasis on AI workloads. This role requires expertise in Azure AI Services, integrating frontier models, and managing AI developer tools as infrastructure components. The consultant will ensure secure, scalable, and high-performing environments for AI applications. | Serve | 5 |
| Cloud & AI Platform - AI Infrastructure Cloud Solution Architect This role focuses on designing, implementing, and scaling AI-ready Azure infrastructure platforms for enterprise clients. The Cloud Solution Architect will ensure secure, resilient, and cost-governed foundations for AI workloads, including GPU platforms, networking, storage, and security. The role involves leading technical engagements, removing blockers, and industrializing repeatable patterns to support production-grade AI solutions and accelerate Microsoft Cloud consumption. | Serve | 5 |
| Principal Software Engineer Principal Software Engineer role focused on leading the architecture, design, and implementation of high-scale, low-latency services with an AI First approach within Microsoft's Identity engineering team. The role involves driving AI/ML-based engineering solutions, cloud environments (Azure), and large distributed systems, with a strong emphasis on security and reliability. | Serve | 5 |
| Senior Software Engineer Senior Software Engineer to design and build a Postgres-based database for modern, AI-native, agent-driven workloads within Microsoft Fabric. The role involves innovating on query planning, execution, and storage layers to support high-performance data access for next-generation applications, extending PostgreSQL with open engines and formats. | Serve | 5 |
| Software Engineering - CTJ - Poly Software Engineer role focused on building and maintaining secure, scalable cloud infrastructure for Azure AI Platform workloads, specifically for air-gapped and sovereign cloud environments. The role involves designing, developing, and operating foundational services that power Azure Machine Learning, Azure AI Services, Azure OpenAI, and Microsoft Foundry capabilities, leveraging AI-assisted development tools and adhering to strong engineering fundamentals. | Serve | 5 |
| MTS - Backend Engineer Backend Engineer role focused on building and scaling the core backend platform, including Orchestrator, Inference, and APIs, for Microsoft's personalized AI assistant, Copilot. The role involves developing consumer-facing AI-powered features and experiences in a fast-paced environment. | Serve | 5 |
| Principal Software Engineer - Azure AI Translation & Language Team The Principal Software Engineer will design and implement large-scale distributed systems for Azure AI Translation and Language services, focusing on infrastructure for model inference, service reliability, and platform architecture. This role involves defining and evolving platform architecture for high availability, scalability, and performance, driving improvements in reliability and operational excellence, and building core infrastructure components. Collaboration with applied science and product teams is key, as is mentoring engineers. | Serve | 5 |
| Senior Software Engineer - Data Platform, AI Infrastructure This role focuses on building and operating the core infrastructure layer of a large-scale, productized data platform that powers critical insights and systems across Azure-based services for AI Infrastructure. The platform processes terabytes to petabytes of data daily and requires a focus on orchestration, APIs, observability, and system reliability. | Serve | 5 |
| Senior Software Engineer Senior Software Engineer to build and run scalable services for Microsoft 365, focusing on real-time communication products like Teams. The role involves applying large data and machine learning techniques to improve services, designing media streaming components, and analyzing production telemetry. | Serve | 5 |
| Software Engineer II Software Engineer II role in the Intelligent Conversation and Communications Cloud (IC3) team at Microsoft, focusing on building and running scalable services for Microsoft 365 products like Teams. The role involves developing real-time media stack components, designing client and server media streaming/communication components, and applying machine learning techniques for system improvements. | Serve | 5 |
| Software Engineer II Software Engineer II role focused on designing, developing, and optimizing networking infrastructure for large-scale AI training and inference in Azure Cloud. The role involves ensuring high performance, low latency, and minimal jitter for distributed AI workloads, working with cutting-edge networking hardware and software. | Serve | 5 |
| Principal Software Engineer Principal Software Engineer on the Ads Data Platform Team, which powers Microsoft's global ads marketplace. The role focuses on building and advancing the core capabilities of the Ads serving stack, a high-scale, low-latency, geo-distributed system involving large-scale machine learning inference for ad ranking and real-time bidding infrastructure. | Serve | 5 |
| Senior Software Engineer- CTJ - Poly Senior Software Engineer to deliver secure, scalable, and mission critical AI infrastructure for Microsoft’s sensitive cloud environments, focusing on foundational services for Azure Machine Learning, Azure AI Services, Azure OpenAI, and Microsoft Foundry. The role involves building and operating AI native full stack systems, leveraging modern tooling and AI systems to accelerate development and enhance product quality within air gapped, sovereign, and commercial clouds. | Serve | 5 |
| Member of Technical Staff - Backend Engineer Backend Engineer for Microsoft Copilot, focusing on building and scaling the core backend platform including Orchestrator, Inference, and APIs to power AI-driven consumer experiences. The role involves developing secure, performant APIs, collaborating with cross-functional teams, and shipping high-quality code in a fast-paced environment. | ServeAgent | 5 |