Principal Software Engineer

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

Seeking a Principal Software Engineer to design and implement large-scale, high-performance distributed systems for AI model serving. This role involves optimizing inference performance, managing complex infrastructure, and ensuring the reliability and scalability of AI services.

What you'd actually do

  1. Design and implement large-scale, high-performance distributed systems for AI model serving.
  2. Optimize inference performance, latency, and throughput for various AI models.
  3. Develop and maintain robust infrastructure for deploying and managing AI services.
  4. Collaborate with AI researchers and engineers to integrate new models and techniques.
  5. Ensure the reliability, scalability, and security of AI serving platforms.

Skills

Required

  • Deep expertise in distributed systems design and implementation
  • Strong experience with high-performance computing and optimization techniques
  • Proficiency in C++, Python, or similar programming languages
  • Experience with AI model serving frameworks (e.g., TensorFlow Serving, TorchServe, Triton)
  • Understanding of GPU computing and optimization
  • Experience with cloud platforms (Azure, AWS, GCP)

Nice to have

  • Experience with Kubernetes and containerization technologies
  • Familiarity with AI/ML concepts and workflows
  • Experience with performance profiling and debugging tools

What the JD emphasized

  • large-scale
  • high-performance
  • distributed systems
  • AI model serving
  • inference performance

Other signals

  • Large-scale distributed systems
  • High-performance computing
  • AI model serving