Principal Group Engineering Manager

Microsoft Microsoft · Big Tech · Hyderabad, TS, IN · Software Engineering

Principal Group Engineering Manager for AIInfra team at Microsoft, responsible for building and scaling the AI data-plane that powers LLM inferencing workloads across Microsoft and Azure customers. The role involves leading a large team to deliver inference capabilities for a wide range of LLMs with a focus on reliability, efficiency, and ultra-low latency.

What you'd actually do

  1. Build and scale the core serving systems and smart request routing and distribution for all LLMs (OpenAI, Anthropic, Mistral, Grok, DeepSeek and many others).
  2. Learn, innovate and build cutting edge innovations in the AI space collaborating with the best and brightest leading it.
  3. Ship new product features and improvements at a high velocity.
  4. Design, implement and deliver AI services to support product offerings for large-scale LLM serving.
  5. Collaborate closely with product management and partner teams to align technical direction with business goals.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 10+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Java
  • 5+ Years Managing large teams
  • Understanding of distributed systems specifically in request serving at scale; including high-performance storage, distributed databases, and networking across global-scale infrastructures.

Nice to have

  • 6+ years of design and problem-solving experience, with understanding of system performance, scalability, and engineering best practices.
  • Experience shipping with high velocity and iterative approaches to track a north star.
  • Demonstrated experience in building high-quality, reliable systems at scale.
  • Ability to lead complex technical initiatives that span multiple teams and disciplines.
  • Customer-obsessed approach to problem solving, with empathy and a drive to deliver impactful solutions.

What the JD emphasized

  • core serving systems
  • large-scale LLM serving
  • ultra-low latency
  • high velocity
  • high-quality, reliable systems at scale

Other signals

  • LLM inferencing workloads
  • serve models at scale
  • ultra-low latency
  • core serving systems
  • request routing and distribution
  • large-scale LLM serving