Site Reliability Engineer Ii- Ctj - Secret

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Site Reliability Engineering

Site Reliability Engineer II for Microsoft's IDEAS organization, focusing on automation, incident response, and reliability improvements for services in regulated government cloud environments. Supports Microsoft 365, Azure, and Windows platforms.

What you'd actually do

  1. Participate as a Designated Responsible Individual (DRI) in a 24x7 on-call rotation, monitoring service health, responding to incidents within defined SLAs, and contributing to post-incident reviews and learning.
  2. Design, build, and maintain automation for deployment, operations, and incident mitigation to improve reliability and reduce manual effort.
  3. Instrument services for observability; collect and analyze telemetry and health signals; and use data to guide reliability and performance improvements.
  4. Collaborate with engineering partners and stakeholders to align on goals, share operational insights, and deliver user-focused solutions.
  5. Apply engineering best practices for development, scaling, and operational excellence to meet performance and customer requirements.

Skills

Required

  • Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.
  • Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Experience with automation, live site operations, and incident response in large-scale cloud or distributed systems.
  • Proficiency in at least one programming or scripting language (for example: C#, Java, Python, or PowerShell).
  • Strong analytical and problem-solving skills, including experience using telemetry and operational data to inform decisions.
  • Effective written and verbal communication skills, and experience collaborating across teams and disciplines.
  • Ability to meet Microsoft, customer, and/or government security screening requirements, including passing the Microsoft Cloud Background Check upon hire and periodically thereafter.
  • active U.S. Government Secret Security Clearance
  • U.S. citizenship

Nice to have

  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field, or equivalent practical experience, with a minimum of 4 years of experience in Site Reliability Engineering or a closely related role.
  • Experience with observability and monitoring systems, including MELT (Metrics, Events, Logs, and Traces) practices.

What the JD emphasized

  • regulated government cloud environments
  • security screening requirements
  • Secret Security Clearance