Member of Technical Staff, AI Data - Mai Superintelligence Team

Microsoft Microsoft · Big Tech · London, United Kingdom +1 · Software Engineering

The AI Data team at Microsoft AI is building the world's most advanced multimodal dataset to power frontier AI models. This role focuses on designing and developing data pipelines for massive multi-modal training data (text, audio, images, video) and building infrastructure to store and process petabytes of data. The team partners with pre-training and post-training teams to improve data recipes through experimentation and collaborates with product teams to identify model gaps.

What you'd actually do

  1. Design and develop data pipelines that ingest enormous amounts of multi-modal training data (text, audio, images, video).
  2. Build and maintain cutting-edge infrastructure that can store and process the petabytes of data needed to power models.
  3. Partner with the pretraining and post-training teams to improve our data recipe by rigorous and careful experimentation.
  4. Collaborate with the product team and other engineers and researchers across Microsoft AI to identify gaps in the current generation of models.

Skills

Required

  • Bachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND experience in business analytics, data science, software development, data modelling or data engineering work OR equivalent experience.
  • Expertise in large scale data engineering ideally applied to AI
  • Expertise in Spark, Kubernetes or similar.

What the JD emphasized

  • Expertise in large scale data engineering ideally applied to AI

Other signals

  • building multimodal datasets
  • powering frontier models
  • large-scale data engineering