Senior AI Software Architect

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

Senior AI Software Architect role focused on optimizing AI model performance and enablement on Maia hardware, involving PyTorch, quantization, parallelization, and inference pipelines.

What you'd actually do

  1. Port and optimize large-scale AI models (e.g., foundation models, diffusion models, YOLO) to run efficiently on Maia hardware.
  2. Apply techniques like KV cache quantization (e.g., BF16 → FP8), checkpointing, and re-sharding for efficient inference and training.
  3. Collaborate on improving inference pipelines, including KV caching in sglang/vllm and performance tuning at the PyTorch level.
  4. Partner with hardware architects and kernel developers for co-design discussions.
  5. Communicate effectively with multiple stakeholders to align on performance goals and deliverables.

Skills

Required

  • C
  • C++
  • C#
  • Java
  • JavaScript
  • Python
  • PyTorch

Nice to have

  • model optimization techniques
  • quantization techniques (PTQ/QAT)
  • KV cache quantization
  • parallelization strategies
  • distributed training concepts
  • sharding
  • allreduce
  • AI inference stacks
  • SGLang
  • vLLM
  • performance profiling
  • Triton kernels
  • CUDA programming
  • AI accelerator hardware
  • embedded systems
  • efficient model checkpointing
  • resharding scripts
  • large-scale model deployments
  • serving at scale
  • ONNX

What the JD emphasized

  • PyTorch-based model development
  • quantization techniques
  • parallelization strategies
  • Maia hardware
  • inference
  • training
  • performance optimization

Other signals

  • model enablement
  • performance optimization
  • inference stack development