Research Intern - AI Evaluation and Alignment

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Applied Sciences

Research Intern role focused on advancing the quality, reliability, and evaluation of LLM-based systems by exploring new ML methods for AI assessment and alignment. Responsibilities include co-developing research projects, implementing ML approaches (training/fine-tuning), and developing evaluation frameworks. Requires PhD enrollment in a technical field and hands-on LLM experience.

What you'd actually do

  1. Co-developing a research project in collaboration with the supervisor and research mentors.
  2. Designing and implementing machine learning approaches, including training and fine-tuning using real-world datasets.
  3. Developing evaluation frameworks and benchmarking methods to assess model quality, robustness, and generalization.
  4. Presentation and communication of research findings

Skills

Required

  • PhD program enrollment in Statistics, Computer Science, Physics, Operations Research, or a related technical field
  • 1 year of hands-on experience working on LLM-related projects (e.g., prompt engineering, building and evaluating LLM-based systems, rewards modeling etc.)
  • 1 year of experience coding in Python

Nice to have

  • Prior experience in reward models for large language models or LLM-as-a-Judge
  • Strong experience with deep learning frameworks (e.g., PyTorch, TensorFlow)
  • familiarity with software engineering best practices (e.g. git)
  • Experience with LLM post-training and evaluation or LLM-based judge systems
  • Research experience demonstrated through publications or projects
  • Ability to work independently in ambiguous or rapidly evolving situations and collaborate effectively across disciplines

What the JD emphasized

  • LLM-based systems
  • evaluation
  • alignment
  • training
  • fine-tuning
  • evaluation frameworks
  • benchmarking methods
  • model quality
  • robustness
  • generalization

Other signals

  • advancing quality, reliability, and evaluation of LLM-based systems
  • exploring new machine learning methods that improve how AI systems assess and align with human expectations
  • developing evaluation frameworks and benchmarking methods to assess model quality, robustness, and generalization