Senior Software Engineer

Microsoft Microsoft · Big Tech · United States · Software Engineering

Senior Software Engineer on the Azure Compute Availability Platform team, focusing on ensuring high VM uptime through intelligent automation, predictive failure models, and AI-driven diagnostics. The role involves designing, developing, and integrating AI/ML solutions into hyperscale cloud infrastructure, with an emphasis on reliability, availability, and operational efficiency. Explores generative AI for diagnostics and root cause analysis, and integrates large-scale AI models into control plane services for smarter repair decisions.

What you'd actually do

  1. Partners with appropriate stakeholders spanning across teams and orgs to determine project requirement. Leads the design and architecture of change management features and services in Azure Compute
  2. Identifies dependencies and authors design documents for features and services.Leverages expertise with appropriate stakeholders to develop project plans, release plans, and work items.
  3. Develops high quality, extensible, maintainable code and coaches others to do the same.Supports livesite as Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions.
  4. Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers.
  5. Collaborates with data scientists and ML engineers to design and integrate predictive models that proactively detect hardware anomalies and trigger live migrations, improving VM uptime and SLA compliance.Leads initiatives to embed AI-driven diagnostics and root cause analysis into availability services, reducing time-to-resolution for incidents and improving operational efficiency.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field
  • 4+ years technical engineering experience
  • coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • Ability to meet Microsoft, customer and/or government security screening requirements

Nice to have

  • Master's Degree in Computer Science or related technical field
  • 6+ years technical engineering experience
  • 8+ years technical engineering experience

What the JD emphasized

  • ensuring every Azure virtual machine achieves a Service Level Agreement (SLA) of 99.99 percent or higher
  • predictive failure models
  • proactively live-migrate virtual machines before failures occur
  • AI-driven diagnostics
  • root cause analysis
  • accelerate incident resolution
  • intelligent observability pipelines
  • anomaly detection
  • trend analysis
  • large-scale AI models
  • smarter, context-aware repair decisions

Other signals

  • integrates large-scale AI models into control plane services
  • builds intelligent observability pipelines
  • collaborates with data scientists and ML engineers to design and integrate predictive models
  • drives the adoption of generative AI tools