Software Engineer II

Microsoft · Big Tech · Hyderabad, TS, IN · Software Engineering

Software Engineer II to build and operate AI Agents as Service for cloud operations, focusing on agent capabilities, orchestration, evaluation, safety, and reliability in production environments.

What you'd actually do

Take ownership of important areas of the Azure SRE Agent Platform, including agent capabilities, orchestration, evaluation, user experiences on different form factors and supporting platform services
Build and iterate on agentic systems, including tools, planning and execution loops, evaluations, and safety mechanisms
Design and ship reliable capabilities that improve incident detection, diagnosis, mitigation, and operational learning
Use telemetry, experiments, evaluations, and user feedback to guide iteration and investment
Contribute to resilient, observable systems that operate safely and effectively in production

Skills

Required

Bachelor’s or Master’s degree in Computer Science, or equivalent practical experience.
4+ years of experience building production software using one or more modern programming languages such as C#, C++, Go, Java or Python.
Strong understanding of Generative AI & software engineering fundamentals, data structures, and problem-solving.
Ability to learn new technologies quickly and adapt to deliver customer and business impact.

Nice to have

Hands-on experience of building and operating LLM powered agentic systems in production, with direct ownership over quality, reliability, and iterations
3+ years of experience building and operating cloud platforms or distributed services, with depth in service architecture, deployment, and observability
Strong product mindset with a track record of owning ambiguous problem spaces and driving them to high-quality outcomes
Solid engineering fundamentals, including systems design, performance, and debugging in complex production environments
Track record of designing, running, and optimizing evaluations for agentic systems, including tools, prompts, and agent loops
Expertise with Kubernetes, container orchestration, or cloud-native infrastructure is a strong plus
Experience contributing to or leading open-source projects at scale is a plus

What the JD emphasized

production software
agentic systems
production
evaluations
production environments

Other signals

AI Agents as Service
agentic systems
production issues
virtual SRE teammates
recommend or performs fixes
quality, safety, security, enterprise scale and real-world impact
full lifecycle of agentic systems in production
core capabilities that shape agent behavior
tool design, planning and execution loops, orchestration, evaluation, and safety guardrails
operational foundations
observability, progressive delivery, reliability engineering, and live-site learning
user experience for these agents
full stack Software Engineer II
next generation of agentic systems for cloud operations
product quality, end-to-end ownership
exciting prototype from a system people trust during critical moments
high autonomy in a highly agile environment
short cycles, thin slices, feature flags, progressive delivery, and constant learning
strong owner’s mindset and a strong bias for action
ownership of ambiguous problems
adopt modern science research, engineering patterns & practices
move quickly, learn from production
continuously raise the quality bar as they ship
Take ownership of important areas of the Azure SRE Agent Platform
agent capabilities, orchestration, evaluation, user experiences on different form factors and supporting platform services
Build and iterate on agentic systems
tools, planning and execution loops, evaluations, and safety mechanisms
Design and ship reliable capabilities that improve incident detection, diagnosis, mitigation, and operational learning
Use telemetry, experiments, evaluations, and user feedback to guide iteration and investment
Contribute to resilient, observable systems that operate safely and effectively in production
Partner closely with engineers, SREs, and product counterparts to turn ambiguous problems into high-quality shipped solutions
Participate in debugging, live-site learning, and post-incident hardening to continuously improve system quality
Contribute to architecture, engineering standards, and development practices across the team
Hands-on experience of building and operating LLM powered agentic systems in production
direct ownership over quality, reliability, and iterations
building and operating cloud platforms or distributed services
depth in service architecture, deployment, and observability
Strong product mindset with a track record of owning ambiguous problem spaces and driving them to high-quality outcomes
Solid engineering fundamentals, including systems design, performance, and debugging in complex production environments
Track record of designing, running, and optimizing evaluations for agentic systems
tools, prompts, and agent loops
Expertise with Kubernetes, container orchestration, or cloud-native infrastructure is a strong plus
Experience contributing to or leading open-source projects at scale is a plus

Read full job description

Overview

In Microsoft’s CoreAI division, the Azure SRE Agent Platform team builds and runs ‘AI Agents as Service’ that help Microsoft customers detect, diagnose, and mitigate any production issues across customer’s services & workloads running on Microsoft platforms. Think of these agents as “virtual SRE teammates” that continuously watch your systems, investigate problems, and recommends or performs fixes, with a focus on quality, safety, security, enterprise scale and real-world impact.

Our work spans the full lifecycle of agentic systems in production. We design and improve the core capabilities that shape agent behavior, including tool design, planning and execution loops, orchestration, evaluation, and safety guardrails. We build the operational foundations that make those systems dependable in practice, including observability, progressive delivery, reliability engineering, and live-site learning. And we build the best user experience for our customers to use these agents from any device seamlessly.

We are looking for a full stack Software Engineer II teammate to help build this next generation of agentic systems for cloud operations. This role is for engineers who care deeply about product quality, end-to-end ownership, and the details that separate an exciting prototype from a system people trust during critical moments.

Engineers on our team operate with high autonomy in a highly agile environment: short cycles, thin slices, feature flags, progressive delivery, and constant learning. We are looking for teammates with strong owner’s mindset and a strong bias for action - engineers who take ownership of ambiguous problems, adopt modern science research, engineering patterns & practices, move quickly, learn from production, and continuously raise the quality bar as they ship.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day, we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

Take ownership of important areas of the Azure SRE Agent Platform, including agent capabilities, orchestration, evaluation, user experiences on different form factors and supporting platform services
Build and iterate on agentic systems, including tools, planning and execution loops, evaluations, and safety mechanisms
Design and ship reliable capabilities that improve incident detection, diagnosis, mitigation, and operational learning
Use telemetry, experiments, evaluations, and user feedback to guide iteration and investment
Contribute to resilient, observable systems that operate safely and effectively in production
Partner closely with engineers, SREs, and product counterparts to turn ambiguous problems into high-quality shipped solutions
Participate in debugging, live-site learning, and post-incident hardening to continuously improve system quality
Contribute to architecture, engineering standards, and development practices across the team

Qualifications

Required Qualifications

Bachelor’s or Master’s degree in Computer Science, or equivalent practical experience.
4+ years of experience building production software using one or more modern programming languages such as C#, C++, Go, Java or Python.
Strong understanding of Generative AI & software engineering fundamentals, data structures, and problem-solving.
Ability to learn new technologies quickly and adapt to deliver customer and business impact.

Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years

Preferred Qualifications:

Hands-on experience of building and operating LLM powered agentic systems in production, with direct ownership over quality, reliability, and iterations
3+ years of experience building and operating cloud platforms or distributed services, with depth in service architecture, deployment, and observability
Strong product mindset with a track record of owning ambiguous problem spaces and driving them to high-quality outcomes
Solid engineering fundamentals, including systems design, performance, and debugging in complex production environments
Track record of designing, running, and optimizing evaluations for agentic systems, including tools, prompts, and agent loops
Expertise with Kubernetes, container orchestration, or cloud-native infrastructure is a strong plus
Experience contributing to or leading open-source projects at scale is a plus

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about **requesting accommodations.**