Machine Learning Engineer, Safeguards Research

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Machine Learning Engineer focused on safeguards research, bridging research and engineering. This role involves developing end-to-end pipelines and ML systems for safety research, including training/fine-tuning models, building scalable infrastructure for evaluation, implementing efficient training pipelines, and creating automated systems to understand and mitigate AI risks. The role requires strong ML fundamentals, engineering practices, and experience with Python, ML frameworks, and LLMs.

What you'd actually do

  1. Design and implement ML pipelines for training and evaluating safety classifiers and detection models
  2. Develop systems to fine-tune language models for specific safety evaluation tasks
  3. Build infrastructure for hyperparameter optimization and model selection across safety experiments
  4. Create efficient data processing pipelines that can handle large-scale model outputs and training datasets
  5. Develop tooling to automate the generation, analysis, and classification of jailbreak attempts

Skills

Required

  • Hands-on experience training and fine-tuning basic ML models
  • Fundamental ML concepts like overfitting and regularization
  • Practical experience with improving and evaluating ML models
  • Proficient with ML frameworks (e.g., PyTorch, TensorFlow, JAX) and can implement custom training loops
  • Strong software engineering skills, particularly with Python
  • Building scalable data pipelines and ML infrastructure
  • Prompting and working with large language models

Nice to have

  • Implemented custom loss functions and evaluation metrics
  • Experience with experiment and evaluation tracking tools
  • Built systems that integrate training, evaluation, and deployment pipelines
  • Contributed to open-source machine learning or AI safety tools

What the JD emphasized

  • safety research
  • safety research initiatives
  • safety research
  • safety challenges
  • safety evaluation tasks
  • safety dimensions
  • AI safety tools
  • AI safety

Other signals

  • ML pipelines for training and evaluating safety classifiers
  • fine-tune language models for specific safety evaluation tasks
  • Build infrastructure for hyperparameter optimization and model selection
  • Create efficient data processing pipelines
  • Develop tooling to automate the generation, analysis, and classification of jailbreak attempts
  • Build evaluation frameworks that can systematically test model behaviors across safety dimensions
  • Create flexible interfaces for researchers to experiment with different model architectures and training configurations