Machine Learning Systems Engineer, Encodings and Tokenization

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Machine Learning Systems Engineer focused on developing and optimizing encodings and tokenization systems for Anthropic's Finetuning workflows. This role acts as a bridge between Pretraining and Finetuning teams, building infrastructure that impacts how models learn from data and improving training efficiency. Requires strong software engineering and ML expertise, with experience in ML systems, data pipelines, or ML infrastructure.

What you'd actually do

  1. Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows
  2. Optimize encoding techniques to improve model training efficiency and performance
  3. Collaborate closely with research teams to understand their evolving needs around data representation
  4. Build infrastructure that enables researchers to experiment with novel tokenization approaches
  5. Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline

Skills

Required

  • Significant software engineering experience
  • Demonstrated machine learning expertise
  • Proficient in Python
  • Modern ML development practices
  • Strong analytical skills

Nice to have

  • Working with machine learning data processing pipelines
  • Building or optimizing data encodings for ML applications
  • Implementing or working with BPE, WordPiece, or other tokenization algorithms
  • Performance optimization of ML data processing systems
  • Multi-language tokenization challenges and solutions
  • Research environments where engineering directly enables scientific progress
  • Distributed systems and parallel computing for ML workflows
  • Large language models or other transformer-based architectures

What the JD emphasized

  • Finetuning workflows
  • Pretraining
  • Finetuning
  • tokenization systems
  • model training pipeline

Other signals

  • Develop and optimize encodings and tokenization systems for Finetuning workflows
  • Bridge between Pretraining and Finetuning teams
  • Build critical infrastructure impacting model learning and data interpretation