What you'd actually do

Design and implement high performance microservices and runtime components in C++.

Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale.

Debug and resolve complex production issues related to performance, scaling, and service reliability.

Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure.

Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads.

Overview

Join Microsoft’s AI Core team building high performance runtime systems that serve OpenAI chat and multimodal AI models at scale. This role focuses on systems level optimization for large scale LLM inferencing with deep C++ expertise.

Responsibilities

Join Microsoft’s AI Core team building high performance runtime systems that power OpenAI chat and multimodal AI models at scale. This role focuses on systems level optimization for largescale LLM inferencing with deep C++ expertise.

Design and implement high performance microservices and runtime components in C++.
Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale.
Debug and resolve complex production issues related to performance, scaling, and service reliability.
Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure.
Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads.
Drive systems level innovations for realtime and batch inferencing efficiency.
Participate in code reviews and provide technical mentorship to senior and peer engineers.

Qualifications

**Required: **

6+ years of experience in systems programming with strong expertise in C++.
Proven experience building, deploying, and operating scalable cloud services.
Strong debugging skills and experience using performance profiling and diagnostic tools.
Hands-on experience with distributed systems, Kubernetes, and containerized workloads.
Experience with largescale LLM inferencing infrastructure, including CUDA.

**Other Requirements: **

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

**Preferred Qualifications: **

Experience optimizing AI model inference distributed GPU/CPU stack.
Exposure to Azure OpenAI or similar largescale AI serving platforms.
Understanding of service reliability engineering (SRE) principles and operational excellence

Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about **requesting accommodations.**

Overview

Responsibilities

Design and implement high performance microservices and runtime components in C++.
Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale.
Debug and resolve complex production issues related to performance, scaling, and service reliability.
Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure.
Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads.
Drive systems level innovations for realtime and batch inferencing efficiency.
Participate in code reviews and provide technical mentorship to senior and peer engineers.

Qualifications

**Required: **

6+ years of experience in systems programming with strong expertise in C++.
Proven experience building, deploying, and operating scalable cloud services.
Strong debugging skills and experience using performance profiling and diagnostic tools.
Hands-on experience with distributed systems, Kubernetes, and containerized workloads.
Experience with largescale LLM inferencing infrastructure, including CUDA.

**Other Requirements: **

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

**Preferred Qualifications: **

Experience optimizing AI model inference distributed GPU/CPU stack.
Exposure to Azure OpenAI or similar largescale AI serving platforms.
Understanding of service reliability engineering (SRE) principles and operational excellence

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Principal Software Engineer, Coreai

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals