Cloud Solution Architecture - Infrastructure

Microsoft Microsoft · Big Tech · Kuala Lumpur, Singapore +2 · Cloud Solution Architecture

This role focuses on acting as a trusted technical advisor for Microsoft's strategic customers, helping them improve the reliability, resilience, security, performance, and operational excellence of their Azure environments. It involves proactive assessments, technical guidance, incident leadership, and cross-functional collaboration within a global follow-the-sun model. The CSA will advise on architecture and operations aligned with the Azure Well-Architected Framework, lead complex troubleshooting efforts, facilitate Root Cause Analysis, and drive reduction of reactive operational demand through reliability-focused recommendations and operational maturity improvements. The role also involves performing proactive health assessments, risk reviews, and analyzing telemetry and monitoring platforms to identify trends and develop actionable insights. Customer engagement includes creating knowledge documentation, delivering onboarding assessments, and tracking remediation progress. Global collaboration with various Microsoft teams is essential.

What you'd actually do

  1. Act as a trusted technical advisor, helping customers improve the reliability, resiliency, security, performance, and operational maturity of mission-critical workloads running on Azure.
  2. Lead complex troubleshooting efforts across infrastructure, platform, and application layers, including critical and high-severity incidents.
  3. Perform proactive health assessments, risk reviews, and operational analysis to identify opportunities for improvement and escalation prevention.
  4. Drive operational maturity through recommendations for observability, monitoring, automation, governance, reliability engineering practices, disaster recovery preparedness, and service management processes.
  5. Operate effectively within a global follow-the-sun support model, collaborating with teams across multiple regions and time zones to ensure continuity of service for mission-critical workloads.

Skills

Required

  • Deep technical expertise in Azure environments
  • Strong customer advocacy skills
  • Ability to navigate complex operational challenges
  • Excellent communication skills (technical and executive audiences)
  • Experience with incident leadership and troubleshooting
  • Knowledge of the Azure Well-Architected Framework
  • Experience with reliability, resiliency, security, and performance best practices
  • Familiarity with telemetry, monitoring, and observability tools
  • Ability to work effectively in a global, cross-time-zone environment

Nice to have

  • Experience in regulated environments (e.g., fintech, healthcare)
  • Knowledge of specific Azure services relevant to mission-critical workloads

What the JD emphasized

  • mission-critical workloads
  • Azure Well-Architected Framework
  • critical and high-severity incidents
  • Root Cause Analysis
  • telemetry
  • monitoring platforms
  • observability tools
  • operational maturity
  • reliability engineering practices
  • disaster recovery preparedness
  • service management processes
  • global follow-the-sun support model