AI Research Scientist - Safety Alignment Team

Meta Meta · Big Tech · Menlo Park, CA

AI Research Scientist focused on safety alignment for large language models and multimodal AI systems. Responsibilities include designing, implementing, and evaluating novel safety techniques, curating datasets, fine-tuning LLMs for safety policies, and building infrastructure for evaluation and mitigation. Requires a PhD, 3+ years of research experience, publication record, Python/PyTorch proficiency, and experience with RL techniques for LLM fine-tuning.

What you'd actually do

  1. Design, implement, and evaluate novel safety alignment techniques for large language models and multimodal AI systems
  2. Create, curate, and analyze high-quality datasets for safety alignment
  3. Fine-tune and evaluate LLMs to adhere to Meta’s safety policies and evolving global standards
  4. Build scalable infrastructure and tools for safety evaluation, monitoring, and rapid mitigation of emerging risks
  5. Work closely with researchers, engineers, and cross-functional partners to integrate safety alignment into Meta’s products and services

Skills

Required

  • PhD in Computer Science, Machine Learning, or a relevant technical field
  • 3+ years of industry research experience in LLM/NLP, computer vision, or related AI/ML model training
  • Experience as a technical lead on a team and/or leading complex technical projects from end-to-end
  • Programming experience in Python
  • Hands-on experience with frameworks such as PyTorch
  • Hands-on experience applying RL techniques (e.g., RLHF, PPO, DPO, GRPO, RLVF, reward modeling) to fine-tune large language models for safety and policy adherence
  • Experience developing, fine-tuning, or evaluating LLMs across multiple languages and modalities (text, image, voice, video)
  • Demonstrated experience to innovate in safety alignment, including custom guideline enforcement, dynamic policy adaptation, and rapid hotfixing of model vulnerabilities
  • Experience designing, curating, and evaluating safety datasets, including adversarial and borderline prompt pairs for risk mitigation
  • Experience with distributed training of LLMs (hundreds/thousands of GPUs), scalable safety mitigations, and automation of safety tooling

Nice to have

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

What the JD emphasized

  • Publications at peer-reviewed conferences (e.g. ICLR, NeurIPS, ICML, KDD, CVPR, ICCV, ACL)
  • Experience developing, fine-tuning, or evaluating LLMs across multiple languages and modalities (text, image, voice, video)
  • Demonstrated experience to innovate in safety alignment, including custom guideline enforcement, dynamic policy adaptation, and rapid hotfixing of model vulnerabilities
  • Experience designing, curating, and evaluating safety datasets, including adversarial and borderline prompt pairs for risk mitigation
  • Experience with distributed training of LLMs (hundreds/thousands of GPUs), scalable safety mitigations, and automation of safety tooling

Other signals

  • safety alignment
  • LLMs
  • multimodal AI
  • RLHF