AI Research Scientist - Safety Alignment Team

Meta · Big Tech · Menlo Park, CA

AI Research Scientist focused on safety alignment for large language models and multimodal AI systems. Responsibilities include designing, implementing, and evaluating novel safety techniques, curating datasets, fine-tuning LLMs for safety policies, and building infrastructure for evaluation and mitigation. Requires a PhD, 3+ years of research experience, publication record, Python/PyTorch proficiency, and experience with RL techniques for LLM fine-tuning.

What you'd actually do

Design, implement, and evaluate novel safety alignment techniques for large language models and multimodal AI systems
Create, curate, and analyze high-quality datasets for safety alignment
Fine-tune and evaluate LLMs to adhere to Meta’s safety policies and evolving global standards
Build scalable infrastructure and tools for safety evaluation, monitoring, and rapid mitigation of emerging risks
Work closely with researchers, engineers, and cross-functional partners to integrate safety alignment into Meta’s products and services

Skills

Required

PhD in Computer Science, Machine Learning, or a relevant technical field
3+ years of industry research experience in LLM/NLP, computer vision, or related AI/ML model training
Experience as a technical lead on a team and/or leading complex technical projects from end-to-end
Programming experience in Python
Hands-on experience with frameworks such as PyTorch
Hands-on experience applying RL techniques (e.g., RLHF, PPO, DPO, GRPO, RLVF, reward modeling) to fine-tune large language models for safety and policy adherence
Experience developing, fine-tuning, or evaluating LLMs across multiple languages and modalities (text, image, voice, video)
Demonstrated experience to innovate in safety alignment, including custom guideline enforcement, dynamic policy adaptation, and rapid hotfixing of model vulnerabilities
Experience designing, curating, and evaluating safety datasets, including adversarial and borderline prompt pairs for risk mitigation
Experience with distributed training of LLMs (hundreds/thousands of GPUs), scalable safety mitigations, and automation of safety tooling

Nice to have

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

What the JD emphasized

Publications at peer-reviewed conferences (e.g. ICLR, NeurIPS, ICML, KDD, CVPR, ICCV, ACL)
Experience developing, fine-tuning, or evaluating LLMs across multiple languages and modalities (text, image, voice, video)
Demonstrated experience to innovate in safety alignment, including custom guideline enforcement, dynamic policy adaptation, and rapid hotfixing of model vulnerabilities
Experience designing, curating, and evaluating safety datasets, including adversarial and borderline prompt pairs for risk mitigation
Experience with distributed training of LLMs (hundreds/thousands of GPUs), scalable safety mitigations, and automation of safety tooling

Other signals

safety alignment
LLMs
multimodal AI
RLHF

Read full job description

Meta is seeking AI Research Scientists to join the Safety Alignment team within Meta Superintelligence Labs, dedicated to advancing the safe development and deployment of superintelligent AI. Our mission is to pioneer robust safety alignment techniques that empower Meta’s most ambitious AI capabilities, ensuring billions of users experience our products and services securely and responsibly.

Responsibilities

Design, implement, and evaluate novel safety alignment techniques for large language models and multimodal AI systems Create, curate, and analyze high-quality datasets for safety alignment Fine-tune and evaluate LLMs to adhere to Meta’s safety policies and evolving global standards Build scalable infrastructure and tools for safety evaluation, monitoring, and rapid mitigation of emerging risks Work closely with researchers, engineers, and cross-functional partners to integrate safety alignment into Meta’s products and services Lead complex technical projects end-to-end

Qualifications

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience PhD in Computer Science, Machine Learning, or a relevant technical field 3+ years of industry research experience in LLM/NLP, computer vision, or related AI/ML model training Experience as a technical lead on a team and/or leading complex technical projects from end-to-end Publications at peer-reviewed conferences (e.g. ICLR, NeurIPS, ICML, KDD, CVPR, ICCV, ACL) Programming experience in Python and hands-on experience with frameworks such as PyTorch Hands-on experience applying RL techniques (e.g., RLHF, PPO, DPO, GRPO, RLVF, reward modeling) to fine-tune large language models for safety and policy adherence Experience developing, fine-tuning, or evaluating LLMs across multiple languages and modalities (text, image, voice, video) Demonstrated experience to innovate in safety alignment, including custom guideline enforcement, dynamic policy adaptation, and rapid hotfixing of model vulnerabilities Experience designing, curating, and evaluating safety datasets, including adversarial and borderline prompt pairs for risk mitigation Experience with distributed training of LLMs (hundreds/thousands of GPUs), scalable safety mitigations, and automation of safety tooling