Member of Technical Staff - Vlm

Black Forest Labs Black Forest Labs · Multimodal · Freiburg · Research

Research role focused on developing and integrating state-of-the-art vision-language models (VLMs) into the FLUX generative AI stack, innovating on architectures and improving multimodal understanding for enhanced generation quality and controllability.

What you'd actually do

  1. Lead development and training of state-of-the-art multimodal vision-language models within the FLUX stack — innovating on architectures, not just applying existing ones
  2. Design fine-tuning strategies that adapt VLMs to specialized creative use cases (captioning, editing instructions, prompt enhancement) that general-purpose models can't handle
  3. Research integrations between VLM/LLM capabilities and our diffusion and flow pipelines — finding creative ways to improve generation quality and controllability without computational bottlenecks
  4. Evaluate emerging multimodal architectures, translating the best of recent research into practical improvements

Skills

Required

  • Pretrained or significantly advanced a VLM
  • Strong publication record or production track record on multimodal architectures
  • Deep understanding of how vision and language representations interact
  • Experience with distributed training at multi-node scale
  • Comfortable at the research/production boundary

Nice to have

  • Experience with diffusion or flow-based generative models
  • Experience with autoregressive and diffusion paradigms composition

What the JD emphasized

  • You've pretrained or significantly advanced a VLM (not just SFT'd or LoRA'd one) that was deployed in a production system or released publicly
  • Strong publication record or unambiguous production track record showing you push the frontier on multimodal architectures

Other signals

  • VLM research and integration
  • multimodal generative models
  • distributed training at scale