Multimodal LLM Researcher (mllm)

Pika Labs Pika Labs · AI Frontier · Palo Alto, CA · Research

Research role focused on real-time multimodal generative AI (text, image, video, audio) and agentic platforms, involving novel algorithm design, diffusion model work, training/finetuning, dataset curation, and publishing research.

What you'd actually do

  1. Lead and contribute to research efforts focused on real-time, multimodal generation—including text, image, video, and audio synthesis—as well as orchestration of agentic platform infrastructure
  2. Design and prototype novel algorithms and architectures for high-fidelity, real-time multimodal synthesis and interactive experiences
  3. Focus on real-time aspects of model inference and synthesis across modalities
  4. Work on diffusion model distillation and/or develop diffusion-based world models for multimodal applications
  5. Train and finetune autoregressive and diffusion models in LLM, VLM, or Audio LM contexts with a focus on real-time performance

Skills

Required

  • Python
  • PyTorch
  • TensorFlow
  • large language models
  • vision-language models
  • audio language models
  • deep learning
  • generative models
  • autoregressive models
  • diffusion models
  • real-time systems
  • agentic orchestration

Nice to have

  • video synthesis
  • audio synthesis
  • diffusion model distillation
  • world models
  • multimodal datasets

What the JD emphasized

  • 5+ years of relevant experience
  • Demonstrated impact as first author on major publications in top conferences or journals
  • Deep expertise in at least one area: language modeling (LLM), vision-language modeling (VLM), or audio language modeling (Audio LM)
  • Strong experience with generative models, including autoregressive and diffusion models, and their real-time deployment
  • Experience developing and deploying real-time systems and/or agentic orchestration infrastructure

Other signals

  • real-time multimodal generation
  • agentic platforms
  • LLM/VLM/Audio LM research
  • diffusion models
  • real-time inference