What you'd actually do

Creates and implements code for a product, service, or feature, reusing code as applicable with minimal supervision. Writes and learns to create code that is extensible and maintainable. Considers diagnosability, reliability, and maintainability with few defects, and understands when the code is ready to be shared and delivered. Applies coding patterns and best practices to write code (e.g., leveraging state-of-the-art generative artificial intelligence [GenAI], approaches to source code organization, naming conventions).

Acts as a designated responsible individual (DRI), working on-call to monitor a system/product feature/service for degradation, downtime, or interruptions. Alerts stakeholders as to the status and gains approval to restore system/product/service for simple problems. Responds within service level agreement (SLA) timeframe. Escalates issues to appropriate owners

Maintains operations of live site service, following security best practices when responding quickly to mitigate issues while using the minimum required permissions to do so that arise on a rotational, on-call basis. Identifies solutions and mitigations to simple issues and complex issues when applicable impacting performance or functionality of live site services and escalates appropriately. With minimal supervision, improves troubleshooting guides (TSGs), wikis, tests, and telemetry to make on-call better, and recommends user-facing support documentation and additional test coverage to reduce likelihood of future user-initiated incidents

Contributes to identifying dependencies, and incorporates them into the development of design documents for a product area with little oversight. Helps to actively identify other teams and technologies to leverage, how they interact, and where their own system or team can support others. Understands downstream interactions between systems.

Contributes to the identification of requirements for, and development of automation within production and deployment of a complex product feature, targeting zero-touch deployment when possible. Runs code in simulated, or other non-production environments to confirm functionality and error-free runtime for products with little to no oversight.

Overview

Build the tools and systems that make M365 sovereign cloud operations faster, smarter, and more reliable. As a Software Engineer, you'll write code that solves real operational challenges, from internal platforms and automation to agentic workflows that transform how we deliver and maintain services. Bring your coding skills and your drive to innovate, and help us engineer the next generation of sovereign cloud operations.

The M365 Sovereign Clouds organization exists to ensure that sovereign and government cloud customers have access to the same world-class productivity tools that power organizations around the globe, delivered with the highest standards of security, compliance, and operational excellence. We own and operate Microsoft 365 services including Office 365, Exchange, Outlook, Teams, SharePoint, OneDrive, Purview, Information Protection, PSTN, Office Online, Office Services, and are actively bringing CoPilot to sovereign environments. As part of Azure Silver and Microsoft Sovereign Clouds, our mission is to make secure collaboration accessible, reliable, and performant for those who need it most. We are guided by a commitment to growth mindset, innovation, collaboration, and inclusion, values that shape how we build, operate, and support our services every day. The work we do directly impacts the ability of our customers and their organizations to achieve more.

The Security & Compliance team protects sovereign cloud customers from email-borne threats and helps them meet the strictest regulatory and compliance requirements. We operate the systems that stand between adversaries and some of the most sensitive communications in the world, including Exchange Online Protection, Exchange Transport, and Microsoft Defender for Office. Our Purview Platform delivers the tools that enable customers to classify, label, protect, and govern their data across the M365 ecosystem. Every feature we build and every incident we respond to directly contributes to the security and trustworthiness of Microsoft's sovereign cloud offerings.

The right candidate for this job (is):

• Passionate about distributed systems and working with highly scalable services.

• Enjoys new technological challenges and is motivated to solve them.

• Excited about making better software and continuously improving the development, integration, and deployment processes.

• Self-starter who thrives in a bottoms-up, fast-paced, highly technical environment.

• Effective collaborator, experienced in creating technical partnerships across teams.

• Committed to ensuring exceptional customer satisfaction through technical excellence.

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

• Creates and implements code for a product, service, or feature, reusing code as applicable with minimal supervision. Writes and learns to create code that is extensible and maintainable. Considers diagnosability, reliability, and maintainability with few defects, and understands when the code is ready to be shared and delivered. Applies coding patterns and best practices to write code (e.g., leveraging state-of-the-art generative artificial intelligence [GenAI], approaches to source code organization, naming conventions).

• Acts as a designated responsible individual (DRI), working on-call to monitor a system/product feature/service for degradation, downtime, or interruptions. Alerts stakeholders as to the status and gains approval to restore system/product/service for simple problems. Responds within service level agreement (SLA) timeframe. Escalates issues to appropriate owners .• Maintains operations of live site service, following security best practices when responding quickly to mitigate issues while using the minimum required permissions to do so that arise on a rotational, on-call basis. Identifies solutions and mitigations to simple issues and complex issues when applicable impacting performance or functionality of live site services and escalates appropriately. With minimal supervision, improves troubleshooting guides (TSGs), wikis, tests, and telemetry to make on-call better, and recommends user-facing support documentation and additional test coverage to reduce likelihood of future user-initiated incidents .• Contributes to identifying dependencies, and incorporates them into the development of design documents for a product area with little oversight. Helps to actively identify other teams and technologies to leverage, how they interact, and where their own system or team can support others. Understands downstream interactions between systems. • Contributes to the identification of requirements for, and development of automation within production and deployment of a complex product feature, targeting zero-touch deployment when possible. Runs code in simulated, or other non-production environments to confirm functionality and error-free runtime for products with little to no oversight. • Works with appropriate internal stakeholders (e.g., product manager, privacy/security subject matter expert, technical lead) to understand and determine customer/user requirements for a set of features. Incorporates customer insights into future designs or solution fixes with minimal supervision. • Remains current in skills by investing time and effort into being informed of current developments that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale. Conducts learning and literary sessions to raise awareness on relevant engineering design principles (e.g., security, testability, performance, scalability, accessibility, product knowledge) with minimal guidance.

Qualifications

Required/Minimum Qualifications

Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

**Other Requirements: **

**Security Clearance Requirements: **Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI). Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.
Clearance Verification: This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance

**Preferred/Additional Qualifications **

Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Site Reliability Engineering IC3 - The typical base pay range for this role across the U.S. is USD $102,100 - $202,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $133,800 - $219,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about **requesting accommodations.**