What you'd actually do

Design, train, and deploy multimodal agents that operate autonomously for hours to days

Build agents that reason from raw sensory inputs (pixels, environment state, proprioception) to structured actions

Implement episodic memory systems for persistent state, retrieval, and long-horizon reasoning

Develop planning, reasoning, and tool-use mechanisms for multi-step task execution

Build reliable perception → reasoning → action loops with strong stability and failure recovery

Skills

Required

Experience building autonomous agents that run continuously and complete multi-step tasks
Experience developing agents that reason from pixel inputs or raw environment observations
Experience implementing agent memory, planning, reasoning, or tool-use systems
Experience training or fine-tuning multimodal or foundation models
Strong proficiency in Python and modern deep learning frameworks (e.g., PyTorch)
Strong experimental rigor and ability to design, analyze, and iterate on ML systems
Strong software engineering skills and ability to build reliable, maintainable systems
Ability to work independently and own complex technical problems end-to-end

Nice to have

Experience with embodied AI, robotics learning, or robot policy training
Experience building multimodal foundation models (vision-language or vision-language-action)
Background in agentic AI systems or long-horizon planning architectures
Experience working with large-scale distributed training systems
Publication record in machine learning, robotics, or embodied AI

What the JD emphasized

autonomous general-purpose humanoid robots

humanoid robots with human-level intelligence

multimodal reasoning systems

agents that operate autonomously

episodic memory systems

long-horizon reasoning

tool-use mechanisms

multi-step task execution

perception → reasoning → action loops

evaluation harnesses, benchmarks, and metrics

data studies across the training lifecycle

pretraining, mid-training, and post-training

reinforcement learning, reward modeling, and post-training techniques

robot reasoning, planning, and reliability

scalable model training, distributed experimentation, and agent evaluation

Figure AI is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human-level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.

Our goal is to create embodied AI systems that can perceive the world through pixels, reason over memory, and reliably execute complex tasks over minutes to hours in real environments. We are looking for a Helix AI Engineer, Agentic Systems experienced in building multimodal reasoning systems—agents that operate autonomously from raw sensory input, maintain episodic memory, plan over long horizons, and execute reliably within structured evaluation harnesses, e.g. pixels-to-actions computer use agents. This role focuses on developing the agent architectures and infrastructure that enable robots to function as persistent, reliable embodied agents in the real world.

Responsibilities

Design, train, and deploy multimodal agents that operate autonomously for hours to days
Build agents that reason from raw sensory inputs (pixels, environment state, proprioception) to structured actions
Implement episodic memory systems for persistent state, retrieval, and long-horizon reasoning
Develop planning, reasoning, and tool-use mechanisms for multi-step task execution
Build reliable perception → reasoning → action loops with strong stability and failure recovery
Design evaluation harnesses, benchmarks, and metrics to measure agent reasoning, planning, and reliability
Design and run data studies across the training lifecycle, including pretraining, mid-training, and post-training
Apply reinforcement learning, reward modeling, and post-training techniques to improve agent reasoning and reliability in real-world environments
Develop evaluation frameworks and benchmarks to measure robot reasoning, planning, and task success across diverse scenarios
Build infrastructure for scalable model training, distributed experimentation, and agent evaluation
Work closely with other teams to integrate agent models into the full humanoid autonomy stack

Requirements

Experience building autonomous agents that run continuously and complete multi-step tasks
Experience developing agents that reason from pixel inputs or raw environment observations
Experience implementing agent memory, planning, reasoning, or tool-use systems
Experience training or fine-tuning multimodal or foundation models
Strong proficiency in Python and modern deep learning frameworks (e.g., PyTorch)
Strong experimental rigor and ability to design, analyze, and iterate on ML systems
Strong software engineering skills and ability to build reliable, maintainable systems
Ability to work independently and own complex technical problems end-to-end

Bonus Qualifications

Experience with embodied AI, robotics learning, or robot policy training
Experience building multimodal foundation models (vision-language or vision-language-action)
Background in agentic AI systems or long-horizon planning architectures
Experience working with large-scale distributed training systems
Publication record in machine learning, robotics, or embodied AI
Passion for building autonomous humanoid robots that operate in the real world

The US base salary range for this full-time position is between $150,000 - $350,000 annually.

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.

Requirements

Experience building autonomous agents that run continuously and complete multi-step tasks

Experience developing agents that reason from pixel inputs or raw environment observations

Experience implementing agent memory, planning, reasoning, or tool-use systems

Experience training or fine-tuning multimodal or foundation models

Strong proficiency in Python and modern deep learning frameworks (e.g., PyTorch)

Strong experimental rigor and ability to design, analyze, and iterate on ML systems

Strong software engineering skills and ability to build reliable, maintainable systems

Ability to work independently and own complex technical problems end-to-end

Bonus Qualifications

Experience with embodied AI, robotics learning, or robot policy training

Experience building multimodal foundation models (vision-language or vision-language-action)

Background in agentic AI systems or long-horizon planning architectures

Experience working with large-scale distributed training systems

Publication record in machine learning, robotics, or embodied AI

Passion for building autonomous humanoid robots that operate in the real world

The US base salary range for this full-time position is between $150,000 - $350,000 annually.

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Requirements

Bonus Qualifications

Responsibilities

Requirements

Bonus Qualifications