What you'd actually do

Design, deploy, and maintain Figure's training clusters

Architect and maintain scalable deep learning frameworks for training on massive robot datasets

Work together with AI researchers to implement training of new model architectures at a large scale

Implement distributed training and parallelization strategies to reduce model development cycles

Implement tooling for data processing, model experimentation, and continuous integration

Skills

Required

Strong software engineering fundamentals
Python
PyTorch
managing HPC clusters for deep neural network training
building reliable backend systems

Nice to have

managing cloud infrastructure (AWS, Azure, GCP)
job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.)
configuration management tools (Ansible, Terraform, Puppet, Chef, etc.)

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets. Figure is headquartered in San Jose, CA.

Figure’s vision is to deploy autonomous humanoids at a global scale. Our Helix team is looking for an experienced Training Infrastructure Engineer, to take our infrastructure to the next level. This role is focused on managing the training cluster, implementing distributed training algorithms, data loaders, and developer tools for AI researchers. The ideal candidate has experience building tools and infrastructure for a large-scale deep learning system.

Responsibilities

Design, deploy, and maintain Figure's training clusters
Architect and maintain scalable deep learning frameworks for training on massive robot datasets
Work together with AI researchers to implement training of new model architectures at a large scale
Implement distributed training and parallelization strategies to reduce model development cycles
Implement tooling for data processing, model experimentation, and continuous integration

Requirements

Strong software engineering fundamentals
Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field
Experience with Python and PyTorch
Experience managing HPC clusters for deep neural network training
Minimum of 4 years of professional, full-time experience building reliable backend systems

Bonus Qualifications

Experience managing cloud infrastructure (AWS, Azure, GCP)
Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.)
Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.)

The US base salary range for this full-time position is between $150,000 - $350,000 annually.

The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills, and experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.

Responsibilities

Design, deploy, and maintain Figure's training clusters
Architect and maintain scalable deep learning frameworks for training on massive robot datasets
Work together with AI researchers to implement training of new model architectures at a large scale
Implement distributed training and parallelization strategies to reduce model development cycles
Implement tooling for data processing, model experimentation, and continuous integration

Requirements

Strong software engineering fundamentals
Bachelor's or Master's degree in Computer Science, Robotics, Engineering, or a related field
Experience with Python and PyTorch
Experience managing HPC clusters for deep neural network training
Minimum of 4 years of professional, full-time experience building reliable backend systems

Bonus Qualifications

Experience managing cloud infrastructure (AWS, Azure, GCP)
Experience with job scheduling / orchestration tools (SLURM, Kubernetes, LSF, etc.)
Experience with configuration management tools (Ansible, Terraform, Puppet, Chef, etc.)

The US base salary range for this full-time position is between $150,000 - $350,000 annually.

Helix AI Engineer, Training Infrastructure

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals