Research Engineer, Preparedness - Meta Superintelligence Labs

Meta Meta · Big Tech · Menlo Park, CA +1 · Remote

Research Engineer role focused on evaluating frontier AI systems and risks, developing and refining evaluations for multimodal and agentic models, and producing technical artifacts to inform risk assessments and launch decisions. Requires strong ML engineering and research skills, experience with agentic and multimodal models, and understanding of AI safety and threat models.

What you'd actually do

  1. Build and continuously refine evaluations for multimodal and agentic frontier AI models, including in cybersecurity, chemical security, and biosecurity
  2. Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas
  3. Produce auditable technical artifacts, including evaluation reports and model cards, at high reliability and speed
  4. Scope and deliver end-to-end evaluations under ambiguous and rapidly shifting requirements, re-prioritizing as the threat landscape and Meta’s frontier models evolve
  5. Work across research, engineering, policy, and legal teams to align evaluation priorities with launch timelines

Skills

Required

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 3+ years of experience in machine learning engineering, machine learning research, or a related technical role
  • Proficiency in Python and experience with ML frameworks
  • Experience identifying, designing and completing medium to large technical features independently, without guidance
  • Proven experience in software engineering practices including version control, testing, and code review practices
  • Experience implementing or developing benchmarks for agentic large language models and multimodal models (e.g., vision-language, audio, video, browser agents)
  • Experience working with large-scale distributed systems and data pipelines
  • Experience in red-teaming AI systems, adversarial machine learning, or abuse prevention systems

Nice to have

  • Publications at peer-reviewed venues (NeurIPS, ICML, ICLR, ACL, EMNLP, or similar) related to language model evaluation, AI safety, or deep learning
  • Background in biology or chemistry, particularly chemical, biological, radiological, and nuclear (CBRN) risk domains and experience designing evaluations or threat assessments related to dual-use scientific knowledge
  • Background in cybersecurity, penetration testing, or security research, particularly as it relates to assessing AI-enabled cyber capabilities or designing mitigations for AI-assisted exploitation
  • Track record of open-source contributions to ML evaluation tools or benchmarks

What the JD emphasized

  • high reliability
  • rigor
  • scalability paramount
  • high velocity
  • rapidly shifting priorities
  • ambiguous and rapidly shifting requirements
  • high reliability and speed

Other signals

  • evaluating frontier AI capabilities and risks
  • develop new evaluations grounded in real world threat models
  • ensure evaluations are in place to mitigate risks
  • responsibly handle the development of frontier AI