Member of Technical Staff, AI Platform Engineer - Response Quality

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

The AI Platform Engineer role focuses on building and improving systems within the response quality stack for AI products at Microsoft. This includes data pipelines, evaluation frameworks, RAG, and inference serving. The role involves designing and operating a data flywheel to collect production signals for model improvements, identifying and fixing quality degradation, shipping production retrieval systems, and collaborating with researchers and cross-functional teams to translate quality insights into shipped improvements. The position requires production code ownership end-to-end and emphasizes rapid iteration and deployment.

What you'd actually do

  1. Build and improve systems across the response quality stack, including data pipelines, evaluation frameworks, retrieval (RAG), and inference serving.
  2. Design and operate our data flywheel: instrument, collect, curate, and feed production signals back into model and system improvements.
  3. Isolate and act on loss signals by identifying where and why quality degrades, then build the tooling and systems to fix it.
  4. Ship production retrieval systems by building, tuning, and scaling RAG pipelines that directly improve response quality.
  5. Deploy and iterate fast. Write production code in Python and TypeScript, owning it end to end from prototype to production.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Nice to have

  • Master's degree in computer science, or related technical discipline AND 6+ years technical engineering experience building web services with coding in languages including, but not limited to: Python, Golang, Java/C#, Scala, Rust
  • 1+ years of hands on experience building, deploying, or improving AI/ML systems in production.
  • Strong agentic coding skills.
  • Deep experience with C# or Java.
  • Experience with at least one of the following: C#/Java, Golang, or TypeScript (React/Next.js).
  • Comfort with fast paced, high ambiguity environments.
  • 6+ years' experience in building and releasing production software at the platform level.
  • Deep experience with all of the following languages: Golang, Java/Scala, Typescript (React/Next.js)
  • Experience in model pretraining, post training, evaluation, and inference
  • Experience using Machine Learning frameworks, including experience using, deploying, and scaling language learning models, either personally or professionally.
  • Ability to clearly communicate complex technical concepts to both technical and nontechnical stakeholders.
  • Experience going from zero to one as well as working with developed systems.
  • Ability to work in a fast paced environment, manage multiple priorities, and adapt to changing requirements and deadlines.
  • Demonstrated interpersonal skills and ability to work closely with cross functional teams, including product managers, designers, and other engineers.
  • Proven ability to collaborate and contribute to a positive, inclusive work environment, fostering knowledge sharing and growth within the team.

What the JD emphasized

  • production retrieval systems
  • production code
  • production software

Other signals

  • response quality stack
  • data flywheel
  • loss signals
  • production retrieval systems
  • AI Platform Engineer