Research Scientist Manager, Metaai Assistant Measurement

Meta Meta · Big Tech · Menlo Park, CA

Research Scientist Manager to lead a team focused on measurement and evaluation of AI Assistants powered by foundation models. The role involves defining scientific strategy for evaluation, ensuring methodological rigor, and collaborating with product, engineering, and training teams to ensure reliability and trustworthiness at scale.

What you'd actually do

  1. Define and execute the scientific roadmap for measurement of AI Assistant with purpose, agility, and efficiency
  2. Lead the development of innovative offline and online evaluation metrics, benchmarks, and data synthesis methodologies to drive model improvement and improve user experience
  3. Partner closely with research, engineering, and product teams to integrate robust measurement into the model development lifecycle (the "evaluation flywheel")
  4. Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects
  5. Mentor and grow a team of Industry-Leading research scientists and applied scientists, fostering a culture of scientific rigor, impact, and success

Skills

Required

  • 12+ years of, or PhD + 8 years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development
  • 2+ years of people-management experience leading research scientists or applied scientists
  • Proven technical vision in where the field of generative AI will go
  • Experience developing evaluation frameworks for LLMs, multimodal models, or interactive assistants
  • Expertise in online measurement, user-behavior modeling, A/B testing, and experiment platform design
  • Background in safety, alignment, content evaluation, or human-AI interaction
  • Ability to communicate complex ideas clearly to executives, product partners, and engineering stakeholders
  • Proven ability to build high-performing teams that thrive under ambiguity and rapid iteration
  • Demonstrated experience recruiting, building, structuring, leading technical organizations, including performance management

Nice to have

  • Experience and track record of landing large research and/or product impacts in a time-sensitive environment
  • Experience of and knowledge of online and offline measurement, benchmark building, and data synthesis
  • Experience building large-scale datasets and ontologies used for training or evaluation
  • Experience with cross functional collaboration with other teams including non-engineering functions
  • PhD or equivalent research experience in fields such as machine learning, statistics, econometrics, causal inference, computer science, optimization, or related areas
  • Experience with cross functional collaboration with other teams including non-technical functions

What the JD emphasized

  • landing large research and/or product impacts in a time-sensitive environment
  • Proven technical vision in where the field of generative AI will go
  • Demonstrated experience recruiting, building, structuring, leading technical organizations, including performance management

Other signals

  • measurement and evaluation paradigms
  • align model development with end-user value
  • scientific strategy behind the evaluation flywheel
  • reliability and trustworthiness at scale