Service Engineering Ic3

Microsoft Microsoft · Big Tech · Hyderabad, TS, IN · Service Engineering

This role is responsible for managing customer-facing communications during high-severity incidents within Microsoft Azure. The Customer Reliability Engineering (CRE) team within Azure EngOps focuses on operational excellence, quality, reliability, security, and customer trust. The role involves serving as the primary author and approver of customer communications, ensuring transparency and actionability, partnering with incident commanders, reviewing telemetry, participating in on-call rotations, contributing to Post-Incident Reviews (PIRs), and enhancing automation for service notifications. The ideal candidate has experience in cloud operations, technical communications, or incident response, with exceptional written communication skills and the ability to lead under pressure.

What you'd actually do

  1. Serve as the primary author and approver of customer-facing communications during service incidents (SEV0/SEV1/SEV2), coordinating across Engineering, Support, PM, and Field.
  2. Ensure every message to customers reflects transparency, empathy, and actionability, even in high-pressure and fast-moving environments.
  3. Actively partner with Incident Commanders to stay synchronized on technical developments and customer impact during live-site events.
  4. Review telemetry, support signals, and field input to guide communication strategy and tailor messaging to affected audiences.
  5. Participate in the on-call rotation as a Customer Communications Lead, contributing to a 24/7 response model.

Skills

Required

  • 5+ years of experience in cloud operations, technical communications, incident response, or SRE roles in platforms like Azure, AWS, or GCP.
  • Enterprise in a 24x7x365 enterprise environment.
  • Exceptional written communication skills—able to distill complex technical topics into clear, concise, and customer-appropriate language under pressure.
  • Cross-team collaboration skills—able to align stakeholders and drive messaging consensus across Engineering, Comms, Support, and Field.
  • Demonstrated ability to make quick decisions and prioritize customer needs during ambiguity and chaos.
  • Understanding of incident management frameworks (e.g., ITIL) and customer communication strategies during high-impact events.
  • Strategic thinking and a customer-first mindset; able to advocate for improvements in platform transparency and experience.
  • Excellent problem-solving, judgment, and decision-making skills.
  • BS/BA in Communications, History, English, Engineering, Computer Science, or equivalent experience.

Nice to have

  • Familiarity with service health platforms and tooling for communicating incident status at scale (e.g., Azure Service Health, SHP, ICET, Status Page).
  • 3+ years experience managing or leading customer communications for high-severity incidents or outages.
  • Prior experience as an Incident Commander, Crisis Comms Manager, or Live Site Engineering lead.
  • Familiarity with cloud resiliency patterns, failover models, and recovery scenarios.
  • Experience with AI/ML-based tooling for incident detection, log correlation, or predictive alerting is a plus.
  • Certifications in cloud technologies (Azure, AWS, GCP), ITIL, or SRE frameworks are desirable.

What the JD emphasized

  • Exceptional written communication skills
  • customer-facing communications
  • high-severity incidents
  • incident management frameworks
  • customer communication strategies