What you'd actually do

Manage fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling.

Implement and maintain end-to-end training pipelines for Large Language Models.

RFT and Reinforcement learning to the fine tuning and training sections

Distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling).

Dataset, model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale.

Skills

Required

Generative AI (Large Language Models, Multimodal)
training LLMs
fine-tuning LLMs
aligning LLMs
Reinforcement Learning
Reinforcement Fine-Tuning (RFT)
dataset management
model management
experiment management
multi-node orchestration
checkpointing
failure recovery
cost-efficient scaling
SFT
PEFT
LoRA
adapters
distillation
policy optimization
reward modeling
versioning
lineage
evaluation
reproducible fine-tuning

Nice to have

Golang
Python
PyTorch
vLLM
Performance optimizations on GPU systems
inference frameworks

Other signals

building a comprehensive managed platform for the entire application development lifecycle

leveraging Machine Learning models, including Large Language Models (LLMs)

Manage fine-tuning systems for large foundation models

Implement and maintain end-to-end training pipelines for Large Language Models

Dataset, model, and experiment management

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.

We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.

We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.

If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.

About This Role:

The Senior Staff Software Engineer for the Model LifeCycle team will play a crucial role in building a comprehensive managed platform for the entire application development lifecycle, with a specific focus on leveraging Machine Learning models, including Large Language Models (LLMs).

What You’ll Be Working On:

Manage fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling.
Implement and maintain end-to-end training pipelines for Large Language Models.
RFT and Reinforcement learning to the fine tuning and training sections
Distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling).
Dataset, model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale.

What You’ll Bring to the Team:

Advanced degree in Computer Science, Engineering, or a related field.
8+ years of industry experience leading and driving impactful projects in the AI Space
Experience in Generative AI (Large Language Models, Multimodal).
Hands-on experience training, fine-tuning, and aligning LLMs using Reinforcement Learning and Reinforcement Fine-Tuning (RFT) techniques.
Proactive and collaborative approach with the ability to work autonomously
Passion for building cutting-edge AI products and solving challenging technical problems.

Bonus Points:

Proficiency in Golang or Python for large-scale, production-level services and PyTorch
Contributions to open-source AI projects such as vLLM or similar frameworks.
Performance optimizations on GPU systems and inference frameworks.

Benefits:

Competitive compensation
Restricted Stock Units
Paid time off & paid holidays
Comprehensive health, dental & vision insurance
Employer contributions to HSA account
Paid parental leave
Paid life insurance, short-term and long-term disability
Professional development & tuition reimbursement
Mental health & wellness support
Commuter benefits (parking & transit)
Cell phone stipend
401(k) Retirement plan with company match up to 4% of salary
Volunteer time off

Compensation Range

Compensation will be paid in the range of up to $237,600 - $318,240 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.