What you'd actually do

Design and ship LLM‑powered assistant features, including conversational flows, agentic behaviors, retrieval pipelines, and multimodal interactions.

Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency.

Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality.

Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance.

Develop internal tools for prompt experimentation, model comparison telemetry and debugging automated eval pipelines

Skills

Required

Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or consulting experience OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 2+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or equivalent experience.

Nice to have

Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR equivalent experience.
2+ years shipping production-level code, models, or data analysis.
1+ years using AI-assisted coding and analysis techniques.
Experience working on small teams and mid-stage startup environments.
Experience working on AI products.
PhD in engineering, applied math, statistics, or related analytical field.
4+ years shipping production-level code, models, or data analysis.
Deep experience building from zero-to-one.
Hands on work hillclimbing AI evaluations.

What the JD emphasized

LLM product engineering

evaluation science

hillclimbing

internal tool building

startup-founder energy

bias for action

rapid iteration

comfort with ambiguity

fast-moving AI team

ideas ship quickly

impact is immediate

culture of experimentation

clarity

high-quality execution

shipping production-level code, models, or data analysis

building from zero-to-one

Hands on work hillclimbing AI evaluations

Other signals

LLM product engineering

evaluation science

hillclimbing

internal tool building

LLM-powered assistant features

agentic behaviors

retrieval pipelines

multimodal interactions

prompt architectures

orchestration logic

evaluation frameworks

safety

UX quality

hillclimbing loops

tool-use strategies

failure modes

prompt experimentation

model comparison telemetry

automated eval pipelines

reusable frameworks

ranking

classification

summarization

personalization

startup-founder energy

bias for action

rapid iteration

comfort with ambiguity

fast-moving AI team

ideas ship quickly

impact is immediate

culture of experimentation

clarity

high-quality execution

Overview

We’re hiring a Applied AI Engineer to join a fast‑moving, high‑ownership team building next‑generation AI assistant and productivity capabilities. This role blends LLM product engineering, evaluation science, hillclimbing, and internal tool building with the pace and creativity of a startup.

You’ll work across the entire lifecycle of features from early prototypes to production‑grade systems and help define how millions of users interact with AI.

Responsibilities

LLM Feature & Agent Development

Design and ship LLM‑powered assistant features, including conversational flows, agentic behaviors, retrieval pipelines, and multimodal interactions.

Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency.
Prototype new capabilities rapidly and iterate based on user signals and evaluation data.

Evaluation, Hillclimbing & Quality Systems

Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality.

Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance.
Analyze failure modes, design mitigations, and drive systematic improvements across the stack.

LLM Tooling & Internal Infrastructure

Develop internal tools for prompt experimentation, model comparison telemetry and debugging automated eval pipelines
Create reusable frameworks that accelerate the entire AI org’s ability to ship high‑quality assistant features.

Applied ML & Product Integration

Integrate LLMs with product surfaces, APIs, and backend systems.
Build lightweight ML components (ranking, classification, summarization, personalization) that enhance assistant intelligence.
Collaborate with PM, design, and research to turn ambiguous ideas into polished user experiences.

High‑Velocity Teamwork

Operate with startup‑founder energy: bias for action, rapid iteration, and comfort with ambiguity.

Work closely with researchers, engineers, and product leaders in a fast‑moving AI team where ideas ship quickly and impact is immediate.
Contribute to a culture of experimentation, clarity, and high‑quality execution.
Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency.
Prototype new capabilities rapidly and iterate based on user signals and evaluation data.

Qualifications

Required/minimum qualifications

Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or consulting experience OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 2+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR equivalent experience.

Preferred Qualifications

Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR equivalent experience.
2+ years shipping production-level code, models, or data analysis.
1+ years using AI-assisted coding and analysis techniques.
Experience working on small teams and mid-stage startup environments.
Experience working on AI products.
PhD in engineering, applied math, statistics, or related analytical field.
4+ years shipping production-level code, models, or data analysis.
Deep experience building from zero-to-one.
Hands on work hillclimbing AI evaluations.

Data Science IC3 - The typical base pay range for this role across the U.S. is USD $100,600 - $199,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $131,400 - $215,400 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Data Science IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about **requesting accommodations.**

Overview

You’ll work across the entire lifecycle of features from early prototypes to production‑grade systems and help define how millions of users interact with AI.

Responsibilities

LLM Feature & Agent Development

Design and ship LLM‑powered assistant features, including conversational flows, agentic behaviors, retrieval pipelines, and multimodal interactions.

Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency.
Prototype new capabilities rapidly and iterate based on user signals and evaluation data.

Evaluation, Hillclimbing & Quality Systems

Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality.

Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance.
Analyze failure modes, design mitigations, and drive systematic improvements across the stack.

LLM Tooling & Internal Infrastructure

Develop internal tools for prompt experimentation, model comparison telemetry and debugging automated eval pipelines
Create reusable frameworks that accelerate the entire AI org’s ability to ship high‑quality assistant features.

Applied ML & Product Integration

Integrate LLMs with product surfaces, APIs, and backend systems.
Build lightweight ML components (ranking, classification, summarization, personalization) that enhance assistant intelligence.
Collaborate with PM, design, and research to turn ambiguous ideas into polished user experiences.

High‑Velocity Teamwork

Operate with startup‑founder energy: bias for action, rapid iteration, and comfort with ambiguity.

Work closely with researchers, engineers, and product leaders in a fast‑moving AI team where ideas ship quickly and impact is immediate.
Contribute to a culture of experimentation, clarity, and high‑quality execution.
Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency.
Prototype new capabilities rapidly and iterate based on user signals and evaluation data.

Qualifications

Required/minimum qualifications

Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or consulting experience OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 2+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR equivalent experience.

Preferred Qualifications

Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR equivalent experience.
2+ years shipping production-level code, models, or data analysis.
1+ years using AI-assisted coding and analysis techniques.
Experience working on small teams and mid-stage startup environments.
Experience working on AI products.
PhD in engineering, applied math, statistics, or related analytical field.
4+ years shipping production-level code, models, or data analysis.
Deep experience building from zero-to-one.
Hands on work hillclimbing AI evaluations.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.