Research Scientist Manager, Metaai Assistant Measurement

Meta · Big Tech · Menlo Park, CA

Research Scientist Manager to lead a team focused on measurement and evaluation of AI Assistants powered by foundation models. The role involves defining scientific strategy for evaluation, ensuring methodological rigor, and collaborating with product, engineering, and training teams to ensure reliability and trustworthiness at scale.

What you'd actually do

Define and execute the scientific roadmap for measurement of AI Assistant with purpose, agility, and efficiency
Lead the development of innovative offline and online evaluation metrics, benchmarks, and data synthesis methodologies to drive model improvement and improve user experience
Partner closely with research, engineering, and product teams to integrate robust measurement into the model development lifecycle (the "evaluation flywheel")
Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects
Mentor and grow a team of Industry-Leading research scientists and applied scientists, fostering a culture of scientific rigor, impact, and success

Skills

Required

12+ years of, or PhD + 8 years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development
2+ years of people-management experience leading research scientists or applied scientists
Proven technical vision in where the field of generative AI will go
Experience developing evaluation frameworks for LLMs, multimodal models, or interactive assistants
Expertise in online measurement, user-behavior modeling, A/B testing, and experiment platform design
Background in safety, alignment, content evaluation, or human-AI interaction
Ability to communicate complex ideas clearly to executives, product partners, and engineering stakeholders
Proven ability to build high-performing teams that thrive under ambiguity and rapid iteration
Demonstrated experience recruiting, building, structuring, leading technical organizations, including performance management

Nice to have

Experience and track record of landing large research and/or product impacts in a time-sensitive environment
Experience of and knowledge of online and offline measurement, benchmark building, and data synthesis
Experience building large-scale datasets and ontologies used for training or evaluation
Experience with cross functional collaboration with other teams including non-engineering functions
PhD or equivalent research experience in fields such as machine learning, statistics, econometrics, causal inference, computer science, optimization, or related areas
Experience with cross functional collaboration with other teams including non-technical functions

What the JD emphasized

landing large research and/or product impacts in a time-sensitive environment
Proven technical vision in where the field of generative AI will go
Demonstrated experience recruiting, building, structuring, leading technical organizations, including performance management

Other signals

measurement and evaluation paradigms
align model development with end-user value
scientific strategy behind the evaluation flywheel
reliability and trustworthiness at scale

Read full job description

Meta Superintelligence Labs is seeking a Research Scientist Manager to lead a high-impact team working on the next generation of AI Assistants powered by frontier-scale foundation models. This role centers on building cutting-edge measurement and evaluation paradigms that align fast-moving model development with real end-user value. As a Research Scientist Manager, you will define the scientific strategy behind the evaluation flywheel, ensure scientific across methodologies, and partner with product, engineering, and model training teams to steer the system toward reliability and trustworthiness at scale.

Responsibilities

Define and execute the scientific roadmap for measurement of AI Assistant with purpose, agility, and efficiency Lead the development of innovative offline and online evaluation metrics, benchmarks, and data synthesis methodologies to drive model improvement and improve user experience Partner closely with research, engineering, and product teams to integrate robust measurement into the model development lifecycle (the "evaluation flywheel") Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects Mentor and grow a team of Industry-Leading research scientists and applied scientists, fostering a of scientific rigor, impact, and success

Qualifications

12+ years of, or PhD + 8 years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development Experience and track record of landing large research and/or product impacts in a time-sensitive environment 2+ years of people-management experience leading research scientists or applied scientists Proven technical vision in where the field of generative AI will go Experience of and knowledge of online and offline measurement, benchmark building, and data synthesis Experience with cross functional collaboration with other teams including non-engineering functions Demonstrated experience recruiting, building, structuring, leading technical organizations, including performance management PhD or equivalent research experience in fields such as machine learning, statistics, econometrics, causal inference, computer science, optimization, or related areas Experience developing evaluation frameworks for LLMs, multimodal models, or interactive assistants Expertise in online measurement, user-behavior modeling, A/B testing, and experiment platform design Experience building large-scale datasets and ontologies used for training or evaluation Background in safety, alignment, content evaluation, or human-AI interaction Ability to communicate complex ideas clearly to executives, product partners, and engineering stakeholders Proven ability to build high-performing teams that thrive under ambiguity and rapid iteration Experience with cross functional collaboration with other teams including non-technical functions Demonstrated experience recruiting, building, structuring, leading technical organizations, including performance management