Research Engineer - Msl Fair Foundations

Meta Meta · Big Tech · Menlo Park, CA

Research Engineer role focused on building and curating benchmarks and evaluation environments for advanced AI models across text, vision, and audio. The role involves developing novel benchmarks, integrating existing ones, and creating scalable evaluation tooling to directly impact research direction and model development. Requires strong ML engineering and research skills, Python proficiency, experience with ML frameworks, and a track record of publications in relevant venues.

What you'd actually do

  1. Curate and integrate publicly available and internal benchmarks to direct the capabilities of frontier model development
  2. Develop and implement evaluation environments, including environments for novel model capabilities and modalities
  3. Collaborate with external data vendors to source and prepare high-quality evaluation datasets
  4. Execute on the technical vision of research scientists designing new benchmarks and evaluations
  5. Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas

Skills

Required

  • Python
  • ML frameworks such as PyTorch
  • identifying, designing and completing medium to large technical features independently
  • software engineering practices including version control, testing, and code review practices
  • Publications at peer-reviewed venues (NeurIPS, ICML, ICLR, ACL, EMNLP, or similar) related to language model evaluation, benchmarking, or deep learning
  • building reinforcement learning environments
  • implementing or developing evaluation benchmarks for large language models and multimodal models
  • working with large-scale distributed systems and data pipelines
  • language model evaluation frameworks and metrics

Nice to have

  • experience with AI tools
  • hands-on experience with language model post-training
  • deep learning systems
  • Track record of open-source contributions to ML evaluation tools or benchmarks

What the JD emphasized

  • evaluations are the core of AI progress
  • novel benchmarks
  • evaluation tooling at scale
  • rigor, and scalability paramount
  • novel benchmarks and environments
  • evaluation tooling
  • language model evaluation, benchmarking
  • large language models and multimodal models
  • language model evaluation frameworks and metrics
  • ML evaluation tools or benchmarks

Other signals

  • evaluations are the core of AI progress
  • curate and build the benchmarks for our most advanced AI models
  • evaluations you build will directly impact the research direction
  • implementing existing benchmarks to developing novel benchmarks and environments
  • implementing evaluation tooling at scale