Member of Technical Staff - Data Scientist

Microsoft Microsoft · Big Tech · Mountain View, CA +4 · Software Engineering

Data Scientist role focused on building next-generation post-training methods for frontier models at Microsoft AI. Responsibilities include designing evaluations, producing high-quality training data, building scalable data pipelines, and running post-training experiments to improve model capabilities like instruction following, coding, and agentic behaviors. The role operates across the full post-training lifecycle, from data generation to reward modeling and reinforcement learning, with a focus on turning raw model capability into reliable and measurable performance improvements.

What you'd actually do

  1. Design evaluations of advanced model capabilities and use them to drive rapid, high-signal iteration loops
  2. Work with vendors to produce high quality evaluation and training data
  3. Build data pipelines to produce high quality evaluation and training data
  4. Build data flywheels to hill-climb on model weaknesses, using data from various surfaces where our models are deployed
  5. Ensure optimal quality, quantity and coverage of data across our post-training stages

Skills

Required

  • Hands-on experience with large language models, including training or applying them in production
  • Designing and running post-training experiments (evals, ablations, preference tuning / RLHF-style methods)
  • Building and owning scalable data pipelines for training and evaluation data
  • Strong Python skills for ML experimentation, data processing, and analysis
  • Solid statistical, experimental, and general engineering fundamentals

Nice to have

  • Demonstrated SOTA results in any area of large-scale training, inference, or evaluation

What the JD emphasized

  • Hands-on experience with large language models, including training them or applying them in production (not just prompting)
  • Designing and running post-training experiments (evals, ablations, preference tuning / RLHF-style methods)
  • Building and owning scalable data pipelines for training and evaluation data

Other signals

  • post-training methods for frontier models
  • evaluation design
  • high-quality training data
  • scalable data pipelines
  • state-of-the-art foundation models