What you'd actually do

Push the boundaries: Apply cutting-edge Large Language Model (LLM) and Agentic technology to solve reliability challenges in cloud and AI systems.

Innovate in failure diagnosis and prevention: Build novel tools for monitoring, logging, and troubleshooting at scale.

Validate your ideas in the wild: Integrate and evaluate your solutions on real Microsoft services and incidents.

Skills

Required

PhD program in Computer Science or related STEM field
experience building scalable and reliable systems
ability to develop original research agenda
ability to collaborate effectively with other researchers and product development teams

Nice to have

Proficient interpersonal skills, cross-group, and cross-culture collaboration
Ability to think unconventionally to derive creative and innovative solutions

Overview

Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.

Are you passionate about building the future of reliable, large-scale cloud and AI systems? The Systems Reliability Group at Microsoft Research is looking for motivated Research Interns to tackle cutting-edge challenges at the intersection of distributed systems, AI systems, and software engineering.

We tackle some of the toughest challenges in modern computing—designing innovative reliability mechanisms, building scalable debugging tools, and leveraging AI to improve system dependability. We also explore how to ensure the reliability of AI systems themselves, a critical frontier as AI becomes integral to cloud services.

As a Research Intern, you’ll have the opportunity to:

Dive into real-world systems: Work with large-scale codebases, configurations, and deployments powering Microsoft Azure and Office 365.
Analyze production data: Discover how real cloud systems fail—and design strategies to prevent it.
Push the boundaries: Apply cutting-edge Large Language Model (LLM) and Agentic technology to solve reliability challenges in cloud and AI systems.
Innovate in failure diagnosis and prevention: Build novel tools for monitoring, logging, and troubleshooting at scale.
Validate your ideas in the wild: Integrate and evaluate your solutions on real Microsoft services and incidents.

Why Join Us?

Collaborate with world-class researchers and engineers at Microsoft Research.
Partner with Azure and Office 365 product teams to bring your ideas to life.
Access thousands of real-world software projects to test and refine your innovations.
Publish your work in top-tier systems conferences and make a lasting impact on the industry.

If you have a systems background, a passion for AI and reliability, and a drive to solve practical challenges at global scale, we’d love to hear from you!

Responsibilities

Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world’s best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.

Qualifications

Required Qualifications

Currently enrolled in a PhD program in Computer Science or a related STEM field.

Other Requirements

Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship.
In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter.

Preferred Qualifications

Experience of building scalable and reliable systems.
Demonstrated ability to develop original research agenda.
Ability to collaborate effectively with other researchers and product development teams.
Proficient interpersonal skills, cross-group, and cross-culture collaboration.
Ability to think unconventionally to derive creative and innovative solutions.

The base pay range for this internship is USD $6,710 - $13,270 per month. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $8,760 - $14,360 per month.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-intern-pay

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about **requesting accommodations.**

Overview

As a Research Intern, you’ll have the opportunity to:

Dive into real-world systems: Work with large-scale codebases, configurations, and deployments powering Microsoft Azure and Office 365.
Analyze production data: Discover how real cloud systems fail—and design strategies to prevent it.
Push the boundaries: Apply cutting-edge Large Language Model (LLM) and Agentic technology to solve reliability challenges in cloud and AI systems.
Innovate in failure diagnosis and prevention: Build novel tools for monitoring, logging, and troubleshooting at scale.
Validate your ideas in the wild: Integrate and evaluate your solutions on real Microsoft services and incidents.

Why Join Us?

Collaborate with world-class researchers and engineers at Microsoft Research.
Partner with Azure and Office 365 product teams to bring your ideas to life.
Access thousands of real-world software projects to test and refine your innovations.
Publish your work in top-tier systems conferences and make a lasting impact on the industry.

If you have a systems background, a passion for AI and reliability, and a drive to solve practical challenges at global scale, we’d love to hear from you!

Responsibilities

Qualifications

Required Qualifications

Currently enrolled in a PhD program in Computer Science or a related STEM field.

Other Requirements

Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship.
In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter.

Preferred Qualifications

Experience of building scalable and reliable systems.
Demonstrated ability to develop original research agenda.
Ability to collaborate effectively with other researchers and product development teams.
Proficient interpersonal skills, cross-group, and cross-culture collaboration.
Ability to think unconventionally to derive creative and innovative solutions.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-intern-pay

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Research Intern - Reliability of Cloud and AI Systems

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Required Qualifications

Other Requirements

Preferred Qualifications

Required Qualifications

Other Requirements

Preferred Qualifications