What you'd actually do

Architect and maintain scalable, reliable, and observable Big Data Infrastructure for mission-critical AI applications.

Champion DevOps and SRE best practices—automated deployments, service monitoring, and incident response.

Build a self-service big data platform that empowers data and platform engineers and researchers.

Develop robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code tools (Bicep, Terraform, ARM).

Collaborate with Data Engineers, Data Scientists, AI Researchers, and Developers to deliver secure, seamless big data workflows.

Skills

Required

Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering OR equivalent experience.

Nice to have

4+ years in Big Data Infrastructure, DevOps, SRE, or Platform Engineering.
3+ years of hands-on experience managing and scaling distributed systems—from bare-metal to cloud-native environments.
2+ years deploying containerized applications using Kubernetes and Helm/Kustomize.
Solid scripting and automation skills using Python, Bash, or PowerShell.
Proven success in CI/CD pipeline management, release automation, and production troubleshooting.
Experience working with Databricks for scalable data processing and analytics.
Familiarity with security practices in infrastructure environments, including IAM, OAuth, and Kerberos administration.
Proven experience with cloud-native infrastructure across Azure, AWS, or GCP.
Hands-on expertise with modern data platforms like Databricks, including:
Deep understanding of data storage and processing technologies:
Relational & NoSQL databases
Key-value stores.
Spark compute engines.
Distributed file systems (e.g., HDFS, ADLS Gen2).
Messaging systems (e.g., Event Hub, Kafka, RabbitMQ).
Capacity planning and incident management for large-scale big data systems.
Solid collaboration history with Data Engineers, Data Scientists, ML Engineers, Networking, and Security teams.
Familiarity with modern web stacks: TypeScript, Node.js, React, and optionally PHP.
Exposure to agentic workflows, deep learning, or AI frameworks.
Practical experience integrating LLMs (e.g., GPT-based models) into daily workflows—automating documentation, code generation, reviews, and operational intelligence.
Solid grasp of prompt engineering techniques to design, optimize, and evaluate interactions with LLMs.
Demonstrated ability to troubleshoot and resolve complex performance and scalability issues across infrastructure layers.
Excellent interpersonal and communication skills, with a solid passion for mentorship and continuous learning.
Experience applying LLMs to DevOps workflows, enhancing incident response, and streamlining cross-functional collaboration is a solid advantage.

What the JD emphasized

mission-critical AI applications

big data platform

big data workflows

data pipelines and infrastructure

system performance

system security

big data infrastructure

distributed systems

data processing and analytics

data storage and processing technologies

large-scale big data systems

AI frameworks

LLMs into daily workflows

LLMs to DevOps workflows

Overview

As Microsoft continues to push the boundaries of AI, we are on the lookout for passionate individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad — to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It’s also inclusive: we aim to make AI accessible to all — consumers, businesses, developers — so that everyone can realize its benefits.

We’re looking for a Member of Technical Staff - Principal Data Infrastructure Engineer. This role is a dynamic blend of Platform Engineering, DevOps/SRE, and Big Data Infrastructure Engineering, focused on enabling large-scale data and ML pipelines and intelligent systems. If you’ve architected big data platforms from the ground up and are eager to apply that expertise to consumer AI, we want to hear from you.

You’ll bring:

Deep technical expertise
A passion for automation and observability
Fluency in distributed systems
Creativity to design scalable solutions
And just as importantly: empathy, collaboration, and a growth mindset

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50- mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction.

Responsibilities

Architect and maintain scalable, reliable, and observable Big Data Infrastructure for mission-critical AI applications.
Champion DevOps and SRE best practices—automated deployments, service monitoring, and incident response.
Build a self-service big data platform that empowers data and platform engineers and researchers.
Develop robust CI/CD pipelines and automate infrastructure provisioning using Infrastructure as Code tools (Bicep, Terraform, ARM).
Collaborate with Data Engineers, Data Scientists, AI Researchers, and Developers to deliver secure, seamless big data workflows.
Lead technical design reviews and uphold a clean, secure, and well-documented codebase.
Proactively identify and resolve bottlenecks in data pipelines and infrastructure.
Optimize system performance across storage, compute, and analytics layers.
Partner with Security teams to enhance system security (IAM, OAuth, Kerberos).
Embody and promote Microsoft’s values: Respect, Integrity, Accountability, and Inclusion.

Qualifications

Required Qualifications:

Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ years experience in business analytics, data science, software development, data modeling, or data engineering
- OR Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling, or data engineering
- OR equivalent experience.

Preferred Qualifications:

4+ years in Big Data Infrastructure, DevOps, SRE, or Platform Engineering.
3+ years of hands-on experience managing and scaling distributed systems—from bare-metal to cloud-native environments.
2+ years deploying containerized applications using Kubernetes and Helm/Kustomize.
Solid scripting and automation skills using Python, Bash, or PowerShell.
Proven success in CI/CD pipeline management, release automation, and production troubleshooting.
Experience working with Databricks for scalable data processing and analytics.
Familiarity with security practices in infrastructure environments, including IAM, OAuth, and Kerberos administration.
Proven experience with cloud-native infrastructure across Azure, AWS, or GCP.
Hands-on expertise with modern data platforms like Databricks, including:
Deep understanding of data storage and processing technologies:
Relational & NoSQL databases
Key-value stores.
Spark compute engines.
Distributed file systems (e.g., HDFS, ADLS Gen2).
Messaging systems (e.g., Event Hub, Kafka, RabbitMQ).
Capacity planning and incident management for large-scale big data systems.
Solid collaboration history with Data Engineers, Data Scientists, ML Engineers, Networking, and Security teams.
Familiarity with modern web stacks: TypeScript, Node.js, React, and optionally PHP.
Exposure to agentic workflows, deep learning, or AI frameworks.
Practical experience integrating LLMs (e.g., GPT-based models) into daily workflows—automating documentation, code generation, reviews, and operational intelligence.
Solid grasp of prompt engineering techniques to design, optimize, and evaluate interactions with LLMs.
Demonstrated ability to troubleshoot and resolve complex performance and scalability issues across infrastructure layers.
Excellent interpersonal and communication skills, with a solid passion for mentorship and continuous learning.
Experience applying LLMs to DevOps workflows, enhancing incident response, and streamlining cross-functional collaboration is a solid advantage.

#MicrosoftAI

#mai-datainsights #mai-datainsights

Data Engineering IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about **requesting accommodations.**