Senior Software Engineer - AI Frameworks

Microsoft Microsoft · Big Tech · Redmond, WA +2 · Software Engineering

Senior Software Engineer role focused on optimizing large language model (LLM) deployment on Microsoft's MAIA AI accelerators and GPUs. The role involves building software across the stack, including PyTorch, inference systems (vLLM, SGLang), and performance-critical runtime/kernel components. Responsibilities include architecting tensor computation primitives, extending PyTorch for custom accelerators, improving inference stacks, and optimizing kernels for LLM inference and training workloads.

What you'd actually do

  1. Architect and implement efficient tensor computation primitives and software abstractions for custom AI accelerators.
  2. Develop and extend PyTorch features for model onboarding, optimization, and execution on custom AI accelerators.
  3. Contribute to and improve AI inference stacks such as vLLM and SGLang, including scheduling, KV cache management, and serving pipelines.
  4. Design, develop, profile, and optimize high-performance kernels for NPUs (MAIA) and GPUs to accelerate LLM inference and training workloads.
  5. Collaborate across disciplines to define requirements and deliver practical solutions to new technical challenges.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience
  • coding in languages including, but not limited to, C, C++, or Python

Nice to have

  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience
  • Experience with PyTorch internals, custom operators, hardware backend, or torch.compile/Dynamo-based optimization flows.
  • Experience with AI inference stacks such as vLLM, SGLang, or similar large-scale model serving systems.
  • Experience with NPU or GPU kernel development and optimization (e.g., CUDA, Triton, or accelerator-specific toolchains).
  • Familiarity with common LLM concepts such as attention mechanisms, KV caching, quantization (PTQ/QAT), and distributed parallelism strategies (TP, PP, DP).

What the JD emphasized

  • custom AI accelerators
  • LLM inference
  • PyTorch
  • vLLM
  • SGLang
  • kernel development

Other signals

  • AI accelerators
  • LLM deployment
  • inference systems
  • performance-critical runtime
  • custom silicon