Principal Software Engineer - Coreai Model Inference & Serving

Microsoft Microsoft · Big Tech · Redmond, WA +2 · Software Engineering

Principal Software Engineer role focused on building and scaling the AI data-plane for LLM inferencing across Microsoft and Azure. The role involves designing, coding, and shipping core serving systems, smart routing, and request distribution for a wide range of LLMs, aiming for reliability, efficiency, and ultra-low latency.

What you'd actually do

  1. Be a hands-on technical leader, designing, coding, and shipping core serving systems, smart routing, and request distribution for a broad portfolio of LLMs, including OpenAI, Mistral, Grok, DeepSeek, and others.
  2. Build large-scale AI services and platform capabilities that power new products and customer experiences.
  3. Drive cutting-edge innovation in AI systems alongside world-class engineers and cross-functional partners.
  4. Lead through architecture, code reviews, mentorship, and technical excellence while staying close to implementation.
  5. Improve reliability, scalability, observability, efficiency, and performance across mission-critical services.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Java
  • Ability to meet Microsoft, customer and/or government security screening requirements

Nice to have

  • 4+ years of design and problem-solving experience, with understanding of system performance, scalability, and engineering best practices.
  • Understanding of distributed systems specifically in request serving at scale; (e.g. inferencing, L7 gateways, high-performance storage, distributed databases across global-scale infrastructure)
  • Demonstrated experience in building high-quality, reliable systems at scale.
  • Experience using modern AI-assisted development tools and workflows to move faster, improve quality, and amplify engineering impact.
  • Customer-obsessed approach to problem solving, with empathy and a drive to deliver impactful solutions.

What the JD emphasized

  • core serving systems
  • request distribution
  • large-scale AI services
  • platform capabilities
  • reliability
  • scalability
  • observability
  • efficiency
  • performance

Other signals

  • serving LLMs at scale
  • low latency inference
  • AI data-plane