Principal Software Engineer

Microsoft Microsoft · Big Tech · United States · Software Engineering

This role focuses on leading the development and architecture of critical monitoring and alerting systems for Microsoft 365 Resilience, with a specific emphasis on Copilot scenarios. The Principal Software Engineer will ensure the reliability, resilience, performance, and scalability of the platform, driving strategic investments in areas like AI Evaluations and Observability.

What you'd actually do

  1. Lead product development and scaling to customer requirements and apply best practices for meeting scaling needs and performance expectations and holds accountability for products that do not meet expectations.
  2. Partner with stakeholders (e.g., PM, DS, Leadership) to determine user requirements within and across teams.
  3. Proactively seek new knowledge and adapt to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and share knowledge with other engineers.
  4. Guide the team and lead identification of dependencies and the development of design documents for a product, application, service, or platform.
  5. Guide the team to drive multiple group project plans, release plans, and work items in coordination with appropriate stakeholders.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check

Nice to have

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python
  • Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python
  • Analytical mindset with a data-driven approach to problem-solving
  • Collaborative and team-oriented
  • Experience independently owning and delivering technically challenging projects with measurable impact
  • Demonstrated ability to quickly master new technologies, tools, and domains

What the JD emphasized

  • critical monitoring and alerting
  • highest priority scenarios including Copilot
  • robust reliability measures
  • monitored, measured and designed reliably
  • orchestrates the most critical paths across Microsoft 365
  • measure reliability and monitor service health with rigor
  • reliability, resilience, performance, and scalability
  • AI Evaluations
  • Reliability & Resilience
  • Observability & Intelligent Cloud
  • operational excellence
  • meet scaling needs and performance expectations
  • availability, reliability, efficiency, observability, and performance
  • monitoring and operations at scale
  • mitigate system/product/service degradation to avoid downtime or interruptions
  • Analytical mindset with a data-driven approach to problem-solving
  • consistently upholding high standards of quality and engineering rigor
  • Collaborative and team-oriented
  • articulating complex ideas across disciplines, levels, and product areas
  • independently owning and delivering technically challenging projects with measurable impact
  • quickly master new technologies, tools, and domains