Site Reliability Engineer II

Microsoft Microsoft · Big Tech · Hyderabad, TS, IN +1 · Site Reliability Engineering

Site Reliability Engineer II role within the Azure Data engineering team, focusing on the Analysis Services Engine that powers Power BI. The role ensures reliability, scalability, availability, and performance of large-scale services through automation, observability, and incident response. It involves working with high throughput, multi-tenant services, collaborating with partner teams, and participating in on-call rotations. Core responsibilities include system reliability, incident management, performance monitoring, automation, capacity planning, and continuous improvement.

What you'd actually do

  1. Work with all aspects of a high throughput and multi-tenant service.
  2. Collaborate effectively within the team and with partner teams across Microsoft.
  3. Be part of the on-call rotation for maintaining service health.
  4. Design, implement, and refine chosen solutions in close partnership with Product Management and partner teams.
  5. Champion operational excellence via established metrics, process governance, and policy controls for regular assessment and improvement.

Skills

Required

  • software engineering
  • network engineering
  • systems administration
  • large-scale cloud or distributed systems

Nice to have

  • Computer Science
  • Information Technology

What the JD emphasized

  • high throughput
  • multi-tenant service
  • service health
  • operational excellence
  • high availability
  • system failures
  • system health
  • manual work
  • handle demand
  • failures