Senior Software Engineer, Coreai Workload Engines

Microsoft Microsoft · Big Tech · Redmond, WA +2 · Software Engineering

Senior Software Engineer focused on building and optimizing foundational inference engines and APIs for large-scale AI inference across Azure. The role involves improving latency, throughput, availability, and cost for LLMs, working with OpenAI and open-source models, and developing experimentation capabilities for safe and rapid iteration.

What you'd actually do

  1. Optimize inference engines for OpenAI and open-source models by implementing and shipping performance/efficiency improvements across runtime, scheduling, and serving paths (latency, throughput, utilization, availability, and cost).
  2. Run experiments end-to-end: formulate hypotheses, implement engine changes (including Python/PyTorch integration points where relevant), analyze results, and ship improvements behind guardrails.
  3. Build and use experimentation capabilities for large-scale AI inference (experiment lifecycle, tracking, metric modeling, comparability standards, automated analysis) so the team can iterate quickly and safely.
  4. Own serving availability and efficiency for Azure OpenAI Service workloads through tiered experimentation, lean segmentation, and multi-modal utilization across heterogeneous fleets—turning findings into shipped engine improvements.
  5. Design and evolve inference serving architectures to improve utilization and latency using techniques such as disaggregated serving, multi-token prediction, KV offload/retrieval, and quantization—validated via staged rollouts and production guardrails.

Skills

Required

  • Inference engines
  • LLM serving
  • Performance optimization
  • Systems software
  • Cloud infrastructure
  • GPU optimization
  • Experimentation frameworks
  • Benchmarking
  • Profiling
  • Debugging
  • Python
  • PyTorch

Nice to have

  • Networking
  • Storage
  • RDMA
  • InfiniBand
  • RoCE

What the JD emphasized

  • production-grade inference serving improvements
  • experimentation capabilities
  • large scale inferencing
  • production guardrails
  • production reliability

Other signals

  • LLM inference engines
  • GPU inference
  • OpenAI and OSS models
  • Azure OpenAI Service
  • performance optimization
  • experimentation capabilities