Principal Software Engineer, Coreai

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

This role focuses on building and optimizing high-performance runtime systems for large-scale LLM inferencing, specifically for OpenAI chat and multimodal AI models. The engineer will be responsible for systems-level optimization, microservice design, and ensuring the latency, throughput, cost, and reliability of AI inference pipelines.

What you'd actually do

  1. Design and implement high performance microservices and runtime components in C++.
  2. Optimize AI inferencing systems for latency, throughput, cost, and reliability at large scale.
  3. Debug and resolve complex production issues related to performance, scaling, and service reliability.
  4. Collaborate with cross-functional partners to integrate model inference pipelines into scalable infrastructure.
  5. Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads.

Skills

Required

  • C++
  • systems programming
  • scalable cloud services
  • distributed systems
  • Kubernetes
  • containerized workloads
  • LLM inferencing infrastructure
  • CUDA

Nice to have

  • AI model inference distributed GPU/CPU stack optimization
  • Azure OpenAI
  • SRE principles
  • operational excellence

What the JD emphasized

  • deep C++ expertise
  • large scale LLM inferencing
  • multimodal AI models
  • systems level optimization

Other signals

  • large scale LLM inferencing
  • systems level optimization
  • high performance runtime systems
  • multimodal AI models