Helix AI Engineer, Modeling

at Figure AI · Robotics · HQ · AI - Helix Team

AI Engineer focused on developing core model architectures and learning approaches for humanoid robot autonomy, spanning perception, reasoning, and action across multimodal inputs. The role involves designing, training, and deploying models, with a focus on advancing multimodal learning and generalization for embodied AI systems.

What you'd actually do

  1. Design and develop model architectures for perception, reasoning, and action across multimodal inputs (e.g., vision, language, proprioception)
  2. Build models that learn structured representations of the world, including objects, dynamics, and interactions
  3. Advance multimodal learning approaches, including fusion, alignment, and cross-modal reasoning
  4. Improve model capabilities in areas such as generalization, robustness, and long-horizon reasoning
  5. Work across the model lifecycle, from initial research and prototyping to training and deployment

Skills

Required

  • Python
  • PyTorch
  • deep learning frameworks
  • model architectures
  • transformers
  • experimental rigor
  • software engineering skills

Nice to have

  • multimodal models
  • vision-language-action systems
  • representation learning
  • world models
  • structured prediction
  • frontier models
  • embodied AI
  • robotics
  • real-world ML systems
  • large-scale training
  • distributed systems
  • publication record

What the JD emphasized

  • core AI systems
  • core model architectures
  • multimodal fusion
  • robot intelligence
  • embodied AI systems
  • multimodal inputs
  • model capabilities
  • model lifecycle
  • modeling advances
  • model behavior
  • new modeling paradigms
  • deep learning models for vision, language, or multimodal systems
  • modern model architectures
  • model performance
  • model design
  • multimodal models
  • embodied AI
  • real-world ML systems
  • large-scale training

Other signals

  • developing core AI systems
  • design and advance core model architectures
  • multimodal fusion
  • robot intelligence
  • embodied AI systems
Read full job description

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. Our goal is to build embodied AI systems that can perceive, reason, and act in the real world. Figure is headquartered in San Jose, CA, and this role requires 5 days/week in-office collaboration.

Our Helix team is responsible for developing the core AI systems that power humanoid autonomy. We are looking for a Helix AI Engineer, Modeling to design and advance the core model architectures and learning approaches that enable perception, reasoning, and action in embodied systems.

This role focuses on developing new modeling approaches across vision, language, and action—spanning representation learning, multimodal fusion, and model capabilities that directly impact robot intelligence.

Responsibilities

  • Design and develop model architectures for perception, reasoning, and action across multimodal inputs (e.g., vision, language, proprioception)
  • Build models that learn structured representations of the world, including objects, dynamics, and interactions
  • Advance multimodal learning approaches, including fusion, alignment, and cross-modal reasoning
  • Improve model capabilities in areas such as generalization, robustness, and long-horizon reasoning
  • Work across the model lifecycle, from initial research and prototyping to training and deployment
  • Collaborate closely with pretraining, video, generative, RL, and robot learning teams to integrate modeling advances into the full autonomy stack
  • Design experiments and evaluation frameworks to understand model behavior and guide iteration
  • Contribute to the development of new modeling paradigms for embodied AI systems

Requirements

  • Experience designing and training deep learning models for vision, language, or multimodal systems
  • Strong understanding of modern model architectures (e.g., transformers and related approaches)
  • Experience improving model performance through architectural innovation and experimentation
  • Proficiency in Python and deep learning frameworks such as PyTorch
  • Strong experimental rigor and ability to iterate on model design and performance
  • Solid software engineering skills and ability to build reliable, maintainable systems
  • Ability to operate independently and drive ambiguous, high-impact technical problems

Bonus Qualifications

  • Experience with multimodal models (vision-language or vision-language-action systems)
  • Background in representation learning, world models, or structured prediction
  • Experience working on frontier models at companies such as OpenAI, Google DeepMind, Anthropic, Meta, or xAI
  • Familiarity with embodied AI, robotics, or real-world ML systems
  • Experience with large-scale training or distributed systems
  • Publication record in machine learning, computer vision, NLP, or multimodal AI

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.