What you'd actually do

Design and train large-scale foundation models across multimodal data (e.g., text, vision, and robot data)

Develop pretraining strategies that improve generalization, reasoning, and transfer to downstream embodied tasks

Explore and implement architectures including transformer-based and emerging foundation model paradigms

Work on scaling laws, dataset mixture design, and training dynamics for frontier models

Build and optimize large-scale distributed training pipelines across multi-node GPU clusters

Skills

Required

Python
PyTorch
Deep learning architectures
Transformers
Large-scale distributed training
Software engineering

Nice to have

Frontier foundation models
Multimodal pretraining
Scaling laws
Dataset curation
RLHF
Reward modeling
Alignment methods
Embodied AI
Robotics
Real-world deployment constraints
Publication record

What the JD emphasized

Experience training large-scale foundation models or working on pretraining for LLMs or multimodal systems

Strong experimental rigor and ability to iterate on model design and training strategies

Solid software engineering skills and ability to build scalable, reliable systems

Ability to operate independently and drive ambiguous, high-impact technical problems

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. Our goal is to build embodied AI systems that can perceive, reason, and act in the real world. Figure is headquartered in San Jose, CA, and this role requires 5 days/week in-office collaboration.

Our Helix team is responsible for developing the core AI systems that power humanoid autonomy. We are looking for a Helix AI Engineer, Pretraining to build large-scale foundation models that learn from diverse data sources including text, images, video, and robot-collected experience.

This role focuses on advancing pretraining methods that enable generalization, reasoning, and adaptability—forming the backbone for downstream capabilities in perception, planning, and action.

Responsibilities

Design and train large-scale foundation models across multimodal data (e.g., text, vision, and robot data)
Develop pretraining strategies that improve generalization, reasoning, and transfer to downstream embodied tasks
Explore and implement architectures including transformer-based and emerging foundation model paradigms
Work on scaling laws, dataset mixture design, and training dynamics for frontier models
Build and optimize large-scale distributed training pipelines across multi-node GPU clusters
Collaborate closely with video, generative, agent, and robot learning teams to integrate pretrained models into the autonomy stack
Design evaluation frameworks to measure reasoning ability, robustness, and cross-domain generalization
Contribute to post-training approaches including fine-tuning, alignment, and model adaptation

Requirements

Experience training large-scale foundation models or working on pretraining for LLMs or multimodal systems
Strong understanding of modern deep learning architectures, especially transformers
Experience with large-scale distributed training and optimization
Proficiency in Python and deep learning frameworks such as PyTorch
Strong experimental rigor and ability to iterate on model design and training strategies
Solid software engineering skills and ability to build scalable, reliable systems
Ability to operate independently and drive ambiguous, high-impact technical problems

Bonus Qualifications

Experience working on frontier foundation models at companies such as Anthropic, OpenAI, Google DeepMind, or xAI
Experience with multimodal pretraining (vision-language or vision-language-action models)
Background in scaling laws, dataset curation, and large-scale data mixture optimization
Experience with post-training techniques such as RLHF, reward modeling, or alignment methods
Familiarity with embodied AI, robotics, or real-world deployment constraints
Publication record in machine learning, NLP, or multimodal AI

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.

Responsibilities

Design and train large-scale foundation models across multimodal data (e.g., text, vision, and robot data)

Develop pretraining strategies that improve generalization, reasoning, and transfer to downstream embodied tasks

Explore and implement architectures including transformer-based and emerging foundation model paradigms

Work on scaling laws, dataset mixture design, and training dynamics for frontier models

Build and optimize large-scale distributed training pipelines across multi-node GPU clusters

Collaborate closely with video, generative, agent, and robot learning teams to integrate pretrained models into the autonomy stack

Design evaluation frameworks to measure reasoning ability, robustness, and cross-domain generalization

Contribute to post-training approaches including fine-tuning, alignment, and model adaptation

Requirements

Experience training large-scale foundation models or working on pretraining for LLMs or multimodal systems

Strong understanding of modern deep learning architectures, especially transformers

Experience with large-scale distributed training and optimization

Proficiency in Python and deep learning frameworks such as PyTorch

Strong experimental rigor and ability to iterate on model design and training strategies

Solid software engineering skills and ability to build scalable, reliable systems

Ability to operate independently and drive ambiguous, high-impact technical problems

Bonus Qualifications

Experience working on frontier foundation models at companies such as Anthropic, OpenAI, Google DeepMind, or xAI

Experience with multimodal pretraining (vision-language or vision-language-action models)

Background in scaling laws, dataset curation, and large-scale data mixture optimization

Experience with post-training techniques such as RLHF, reward modeling, or alignment methods

Familiarity with embodied AI, robotics, or real-world deployment constraints

Publication record in machine learning, NLP, or multimodal AI

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Requirements

Bonus Qualifications

Responsibilities

Requirements

Bonus Qualifications