GPU Programming Expert

Mistral AI Mistral AI · AI Frontier · Paris, France · Engineering & Infra

Mistral AI is seeking a GPU Programming Expert to optimize the serving and training of large language models at high speed. This role involves writing low-level CUDA kernels and distributed systems code to maximize GPU utilization, rethinking model architectures for efficient inference, and integrating this low-level code into an MLOps framework. The ideal candidate has deep expertise in GPU programming, distributed computing, and a strong understanding of generative AI, with interest in fine-tuning and model applications.

What you'd actually do

  1. Writing low-level code to take all advantage of high-end GPUs (H100) and max out their capacity
  2. Rethinking various part of the generative model architecture to make them more suitable for efficient inference
  3. Integrating low-level efficient code in a high-level MLOps framework

Skills

Required

  • CUDA kernel development
  • GPU programming
  • Distributed systems
  • MLOps
  • Generative AI understanding
  • Performance optimization

Nice to have

  • Fine-tuning language models
  • Using language models for applications

What the JD emphasized

  • High technical competence for writing custom CUDA kernels and pushing GPUs to their limits
  • High expertise on the distributed computation infrastructure of current generation GPU clusters

Other signals

  • serving large language models at high speed on GPUs
  • writing low-level code to take all advantage of high-end GPUs
  • rethinking various part of the generative model architecture for efficient inference
  • integrating low-level efficient code in a high-level MLOps framework
  • custom CUDA kernels
  • distributed computation infrastructure of current generation GPU clusters