Member of Technical Staff, Applied AI Engineer

Microsoft Microsoft · Big Tech · Mountain View, CA +2 · Data Science

Applied AI Engineer role focused on building and shipping LLM-powered assistant features, agentic behaviors, and retrieval pipelines. Responsibilities include prompt architecture, orchestration logic, evaluation frameworks, hillclimbing for performance improvement, and developing internal tooling for experimentation and debugging. The role also involves integrating LLMs with product surfaces and building lightweight ML components for enhanced assistant intelligence. Requires experience in production-level code/models and working in fast-moving AI teams with a startup-like energy.

What you'd actually do

  1. Design and ship LLM‑powered assistant features, including conversational flows, agentic behaviors, retrieval pipelines, and multimodal interactions.
  2. Build prompt architectures, system instructions, and orchestration logic that ensure reliability, grounding, and personality consistency.
  3. Build and maintain evaluation frameworks for correctness, safety, grounding, and UX quality.
  4. Run hillclimbing loops across prompts, models, and tool‑use strategies to continuously improve assistant performance.
  5. Develop internal tools for prompt experimentation, model comparison telemetry and debugging automated eval pipelines

Skills

Required

  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or consulting experience OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 2+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) or equivalent experience.

Nice to have

  • Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Master's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 3+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR Bachelor's Degree in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 5+ years data-science experience (e.g., managing structured and unstructured data, applying statistical techniques and reporting results) OR equivalent experience.
  • 2+ years shipping production-level code, models, or data analysis.
  • 1+ years using AI-assisted coding and analysis techniques.
  • Experience working on small teams and mid-stage startup environments.
  • Experience working on AI products.
  • PhD in engineering, applied math, statistics, or related analytical field.
  • 4+ years shipping production-level code, models, or data analysis.
  • Deep experience building from zero-to-one.
  • Hands on work hillclimbing AI evaluations.

What the JD emphasized

  • LLM product engineering
  • evaluation science
  • hillclimbing
  • internal tool building
  • startup-founder energy
  • bias for action
  • rapid iteration
  • comfort with ambiguity
  • fast-moving AI team
  • ideas ship quickly
  • impact is immediate
  • culture of experimentation
  • clarity
  • high-quality execution
  • shipping production-level code, models, or data analysis
  • building from zero-to-one
  • Hands on work hillclimbing AI evaluations

Other signals

  • LLM product engineering
  • evaluation science
  • hillclimbing
  • internal tool building
  • LLM-powered assistant features
  • agentic behaviors
  • retrieval pipelines
  • multimodal interactions
  • prompt architectures
  • orchestration logic
  • evaluation frameworks
  • safety
  • UX quality
  • hillclimbing loops
  • tool-use strategies
  • failure modes
  • prompt experimentation
  • model comparison telemetry
  • automated eval pipelines
  • reusable frameworks
  • ranking
  • classification
  • summarization
  • personalization
  • startup-founder energy
  • bias for action
  • rapid iteration
  • comfort with ambiguity
  • fast-moving AI team
  • ideas ship quickly
  • impact is immediate
  • culture of experimentation
  • clarity
  • high-quality execution