Helix AI Engineer, Video Pretraining

Figure AI Figure AI · Robotics · HQ · AI - Helix Team

Figure AI is seeking a Helix AI Engineer focused on Video Pretraining to lead the development of large-scale video foundation models. This role involves designing and training models on diverse datasets, developing pretraining strategies for temporal dynamics, and building models that learn transferable representations for downstream tasks in perception, prediction, and embodied reasoning. The engineer will optimize performance, implement efficient data pipelines, and collaborate with other AI teams, while also designing evaluation frameworks.

What you'd actually do

  1. Design and train large-scale video foundation models on diverse datasets spanning internet-scale video and robot-collected data
  2. Develop pretraining strategies that capture temporal dynamics, motion, and object interaction from raw video sequences
  3. Build models that learn transferable representations for downstream tasks such as perception, tracking, prediction, and control
  4. Explore architectures for video understanding and generation, including transformer-based and diffusion-based approaches
  5. Implement efficient data pipelines and training strategies for high-throughput video ingestion and large-scale distributed training

Skills

Required

  • Experience training large-scale models on video data or other high-dimensional sequential modalities
  • Strong understanding of modern deep learning architectures for video, vision, or multimodal systems
  • Experience with large-scale pretraining, including dataset curation, training dynamics, and scaling laws
  • Proficiency in Python and deep learning frameworks such as PyTorch
  • Experience working with distributed training systems and large GPU clusters
  • Strong experimental rigor and ability to iterate quickly on model design and training strategies
  • Solid software engineering skills and ability to build scalable, reliable systems
  • Ability to operate independently and drive ambiguous, high-impact research directions

Nice to have

  • Experience working on frontier video models or multimodal foundation models
  • Background in video diffusion, autoregressive video modeling, or world models
  • Experience at leading AI labs such as OpenAI, Google DeepMind, Google, ByteDance, Midjourney, or Adobe
  • Experience with large-scale dataset construction and filtering for video pretraining
  • Familiarity with robotics, embodied AI, or learning from egocentric / first-person video
  • Publication record in machine learning, computer vision, or multimodal AI

What the JD emphasized

  • large-scale video foundation models
  • pretraining strategies
  • transferable representations
  • large-scale pretraining
  • large GPU clusters
  • large-scale dataset construction

Other signals

  • large-scale video foundation models
  • pretraining strategies
  • transferable representations
  • distributed training