Member of Technical Staff - Post Training

Black Forest Labs Black Forest Labs · Multimodal · Freiburg · Research

This role focuses on the post-training pipeline for multimodal generative models, including data strategy, reward modeling, preference optimization, distillation, and safety tuning. The goal is to improve model quality and align them with human intent, with a strong emphasis on shipping these improvements to users.

What you'd actually do

  1. Own the full post-training pipeline end to end — from data curation and reward modeling through fine-tuning, preference optimization, distillation, safety tuning, evaluation, and deployment
  2. Advance techniques across the post-training stack: SFT, RLHF, RLAIF, DPO, preference learning, and reward modeling to align models with human intent and aesthetic judgment
  3. Work across modalities: text-to-image, image editing, multi-reference, and video post-training
  4. Build personalization and customization capabilities that let users adapt our models to their own creative style
  5. Design and maintain high-throughput fine-tuning and evaluation infrastructure to support rapid iteration across the research team
  6. Identify quality and alignment gaps through rigorous evaluation, then close them through targeted research and engineering

Skills

Required

  • SFT
  • preference optimization (DPO or RLHF)
  • distillation
  • safety tuning
  • PyTorch fluency
  • working across modalities (text-to-image, image editing, multi-reference, video)

Nice to have

  • distillation (LADD, DMD, consistency models, or similar)
  • building high-throughput eval pipelines
  • RLAIF
  • personalization

What the JD emphasized

  • owned post-training for a frontier generative model through release
  • measurable quality wins on human prefs or standard benchmarks
  • Deep experience across the post-training stack
  • Bias toward shipping: measurable model-quality improvements that reach users, not just papers

Other signals

  • post-training pipeline
  • multimodal models
  • human intent alignment
  • shipping models to users