Principal Critical Environment Mechanical Engineer

Microsoft Microsoft · Big Tech · Mt Pleasant, WI +1 · Mechanical Engineering

This role is for a Principal Critical Environment Mechanical Engineer responsible for technical leadership, operational excellence, reliability, troubleshooting, and safety within Microsoft's cloud datacenters. The role involves providing expert guidance, leading technical response efforts, establishing engineering standards, and driving complex cross-functional programs.

What you'd actually do

  1. Serve as a principal technical authority for mechanical systems, providing strategic direction and expert guidance during complex operational, reliability, capacity, and design challenges.
  2. Lead technical response efforts during high-impact operational events, providing field presence, troubleshooting, risk assessment, and leadership to drive rapid resolution and restore capacity.
  3. Apply advanced engineering judgment to ambiguous and high-pressure situations to protect customer capacity, safety, security, and operational continuity.
  4. Drive good stewardship of customer capacity by balancing operational risk, reliability, maintainability, scalability, and business requirements.
  5. Influence technical strategy, engineering priorities, and operational decision making through collaboration across regional and global stakeholder groups.

Skills

Required

  • Mechanical engineering principles
  • Datacenter operations
  • Critical environment systems
  • Technical leadership
  • Problem-solving
  • Troubleshooting
  • Risk assessment
  • Root cause analysis
  • Engineering standards
  • Project management
  • Cross-functional collaboration
  • Communication

Nice to have

  • Environmental sustainability
  • Emerging technologies
  • Systems thinking
  • Talent development

What the JD emphasized

  • critical environment
  • mechanical engineer
  • principal technical authority
  • high-impact operational events
  • customer capacity
  • operational risk
  • engineering standards
  • complex infrastructure challenges
  • major incidents
  • service disruptions
  • escalations
  • significant operational events
  • systemic risks
  • long-term system reliability
  • incident investigations
  • technical governance
  • high-risk activities
  • safety-first culture
  • operational controls
  • security expectations
  • system resilience
  • strategic objectives
  • technical leadership