AI Research Scientist - Msl Fair Foundations

Meta Meta · Big Tech · Menlo Park, CA

Research Scientist role focused on developing and implementing novel evaluations for frontier AI systems, shaping research direction and model development. Requires strong ML research background, experience with LLM/multimodal evaluation, and publication record.

What you'd actually do

  1. Curate and integrate publicly available and internal benchmarks to direct the capabilities of frontier model development
  2. Develop and implement evaluation environments, including environments for novel model capabilities and modalities
  3. Collaborate with external data vendors to source and prepare high-quality evaluation datasets
  4. Execute on the technical vision of research scientists designing new benchmarks and evaluations
  5. Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas

Skills

Required

  • PhD degree in Computer Science, Machine Learning, or a related technical field
  • 3+ years of experience in machine learning engineering, machine learning research, or a related technical role
  • Proficiency in Python and experience with ML frameworks such as PyTorch
  • Experience identifying, designing and completing medium to large technical features independently, without guidance
  • Proven success in software engineering practices including version control, testing, and code review practices
  • Publications at peer-reviewed venues (NeurIPS, ICML, ICLR, ACL, EMNLP, or similar) related to language model evaluation, benchmarking, or deep learning
  • Hands-on experience with language model post-training and deep learning systems, or building reinforcement learning environments
  • Experience implementing or developing evaluation benchmarks for large language models and multimodal models (e.g., vision-language, audio, video)
  • Experience working with large-scale distributed systems and data pipelines
  • Familiarity with language model evaluation frameworks and metrics
  • Track record of open-source contributions to ML evaluation tools or benchmarks

Nice to have

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

What the JD emphasized

  • Evaluations are the core of AI progress
  • novel evaluations
  • AI capability measurement
  • scientific validity
  • methodological rigor
  • measurable benchmarks
  • evaluation insights
  • frontier AI development
  • evaluation environments
  • evaluation datasets
  • evaluation pipelines
  • evaluation suites
  • language model evaluation
  • benchmarking
  • language model evaluation frameworks and metrics
  • ML evaluation tools or benchmarks

Other signals

  • evaluations are the core of AI progress
  • shape the future of AI capability measurement
  • translate organizational priorities into measurable benchmarks
  • translate evaluation insights back into research direction