Research Engineer, Data Infrastructure

Mistral AI Mistral AI · AI Frontier · Paris, France · Research

Mistral AI is seeking a Research Engineer for their Data Infrastructure team to architect and scale massive distributed compute and storage systems for frontier model training and fine-tuning. The role involves building the backbone of their AI ecosystem, managing multi-cluster orchestration, designing future-proof storage solutions, and ensuring operational excellence for large-scale data platforms.

What you'd actually do

  1. Build & Scale: Help us reach our goal of operating massive distributed compute and storage systems
  2. Global Orchestration: Architect and maintain multi-cluster orchestration layers to optimize workload placement across diverse hardware and regions.
  3. Design Future-Proof Storage: Architect our transition to modern storage formats to handle fine-tuning datasets at a scale that anticipates exabyte growth.
  4. Platform Engineering: Contribute to the development of our internal training platform, ensuring seamless model training and fine-tuning capabilities across Kubernetes and SLURM based environments.
  5. Metadata & Lineage: Implement and manage systems to provide clear visibility and lineage as our data and model pipelines grow in complexity.

Skills

Required

  • Python
  • Kubernetes
  • Data Infrastructure
  • MLOps
  • Infrastructure Engineering
  • Distributed Systems
  • Storage Systems
  • Orchestration

Nice to have

  • SLURM
  • Cloud-native deployments
  • Metadata systems
  • Data lineage

What the JD emphasized

  • 4+ years of experience in Data Infrastructure, MLOps, or Infrastructure Engineering
  • Python
  • Kubernetes-native tooling
  • ambiguity

Other signals

  • architecting the backbone of our frontier model training and fine-tuning ecosystem
  • building the specialized compute and data fabrics required to power the development of world-class AI
  • operate some of the largest compute fleets in production and build data lakes and metadata systems with a roadmap toward exabyte-scale architecture
  • implementing sophisticated multi-cluster orchestration and cloud-bursting capabilities
  • architecting the migration away from legacy orchestrators to implementing production-grade pipelines