Member of Technical Staff, Compute Orchestration & Scheduling - Mai Superintelligence Team

Microsoft Microsoft · Big Tech · Mountain View, CA +2 · Software Engineering

This role focuses on building and optimizing the compute orchestration and scheduling layer for large-scale AI model pretraining, utilizing Kubernetes and Ray. It involves workload placement, scaling, reliability, and developer experience, with a direct impact on AI model development and deployment infrastructure.

What you'd actually do

  1. Develop and tune the pretraining scalable software for Nvidia GB200 72NVL CX8 and AMD MIxxx architectures
  2. Benchmark GB200 and AMD MIxxx GPU clusters
  3. Gather data and insights to develop the pretraining compute roadmap
  4. Care deeply about conversational AI and its deployment
  5. Actively contribute to the development of AI models that are powering our innovative products

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field
  • 6+ years technical engineering experience
  • coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python

Nice to have

  • Master's Degree in Computer Science or related technical field
  • 8+ years technical engineering experience
  • 12+ years technical engineering experience
  • coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python

What the JD emphasized

  • pretraining scalable software
  • Nvidia GB200 72NVL CX8
  • AMD MIxxx architectures
  • Benchmark GB200 and AMD MIxxx GPU clusters
  • pretraining compute roadmap
  • Kubernetes
  • Ray
  • compute orchestration and scheduling layer

Other signals

  • compute orchestration
  • scheduling layer
  • Kubernetes
  • Ray
  • workload placement
  • scaling
  • reliability
  • developer experience
  • pretraining scalable software
  • GPU clusters
  • pretraining compute roadmap