Member of Technical Staff, Developer Experience - Mai Superintelligence Team

Microsoft Microsoft · Big Tech · Mountain View, CA +2 · Software Engineering

This role focuses on building and optimizing the infrastructure and developer experience for large-scale ML model training and inference, specifically for Microsoft's AI assistant, Copilot. The responsibilities include improving CI/CD pipelines, developing training tools, enhancing cloud infrastructure, and managing model hosting systems for inference and data generation. The role aims to accelerate iteration and improve the quality of AI models powering innovative products.

What you'd actually do

  1. Design, implement, and optimize CI/CD pipelines for large-scale ML training workloads.
  2. Build developer tools and automation to simplify training and evaluation workflows.
  3. Improve and maintain core infrastructure across multi-cloud environments.
  4. Deploy and manage model hosting systems for inference and data generation.
  5. Collaborate with cross-functional teams to drive best practices in reliability, testability, and performance.

Skills

Required

  • Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • equivalent experience

Nice to have

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • equivalent experience

What the JD emphasized

  • highly effective
  • world-class consumer experiences
  • fast-paced environment
  • wear multiple hats
  • engineering, research, and everything in between
  • model architecture, data curation, training and inference infrastructures, evaluation protocols, alignment and reinforcement learning from human feedback (RLHF), and many other exciting topics at the cutting edge of AI
  • large compute-capacity
  • machine learning development ecosystem
  • scalability, reliability, and efficiency
  • iterate faster
  • higher-quality results
  • core engineering team
  • architectural improvements
  • shape the roadmap
  • critical software and hardware components
  • achieving business objectives
  • diverse user base
  • accelerate the next wave of AI-driven growth and innovation
  • push the boundaries of AI toward Humanist Superintelligence
  • ultra-capable systems that remain controllable, safety-aligned, and anchored to human values
  • amplifies human potential
  • humanity remained firmly in control
  • deliver breakthroughs that benefit society
  • advancing science, education, and global well-being
  • partner with incredible product teams
  • reach billions of users
  • immense positive impact
  • brilliant, highly-ambitious and low ego individual
  • next generation of models
  • work from a designated Microsoft office at least four days a week
  • find a path to get things done despite roadblocks
  • get your work into the hands of users quickly and iteratively
  • design-driven, product development cycle

Other signals

  • building the next wave of capabilities of our personalized AI assistant, Copilot
  • contribute to the development of AI models that are powering our innovative products
  • improving the CI/CD system that powers the entire training codebase
  • designing and building tools that streamline model training workflows
  • enhancing our cloud infrastructure to ensure scalability, reliability, and efficiency
  • host and optimize models for both inference and data generation use cases