Principal Software Engineer

Microsoft Microsoft · Big Tech · Bengaluru, KA, IN · Software Engineering

The Azure AI Inferencing team is seeking a Principal Software Engineer to lead the architecture and design of a large-scale, high-throughput, low-latency model-serving platform for Azure OpenAI generative models, supporting billions of requests daily. The role involves end-to-end ownership of solution quality, cross-team collaboration, incident response, and championing security, privacy, and Responsible AI.

What you'd actually do

  1. Lead architecture and design of complex, distributed systems; make key technical decisions and mentor engineers on design tradeoffs and best practices.
  2. Own solution quality end‑to‑end, including test strategy, security testing, reliability, and operational readiness.
  3. Drive cross‑team collaboration, identifying dependencies, resolving conflicts, and aligning delivery plans across partner teams.
  4. Act as DRI for live systems, leading incident response, root‑cause analysis, and prevention through automation and operational improvements.
  5. Champion security, privacy, compliance, and Responsible AI, establishing security invariants, auditability, and monitoring across the system.

Skills

Required

  • Bachelor’s or Master’s degree in Computer Science, Engineering, Mathematics, or a related field, or equivalent industry experience.
  • 15+ years of professional software development experience, building and operating complex systems.
  • Strong foundations in computer science fundamentals, including algorithms, data structures, systems design, and coding proficiency.
  • Proven experience in architecture and design of large-scale software systems, including making sound technical decisions and tradeoffs.
  • Demonstrated expertise in software engineering lifecycle, including design, development, testing, quality assurance, deployment, and live-site operations.
  • Strong problem-solving, systems thinking, and decision-making skills with high attention to detail.
  • Experience collaborating across teams, handling technical dependency management and conflict resolution.
  • Excellent oral and written communication skills in English, with the ability to clearly explain complex technical concepts.

Nice to have

  • Experience operating real-time, high-throughput, low-latency services in production environments.
  • Hands-on experience designing, implementing, testing, and operating Azure AI or large-scale cloud services, meeting performance, scalability, reliability, and compliance requirements.
  • Experience driving or contributing to engineering efficiency tools or developer productivity improvements.
  • Exposure to security, compliance, and operational best practices for cloud-based or AI-driven services.

What the JD emphasized

  • highly reliable, available platform
  • billions of requests per day
  • high throughput/low latency
  • live systems
  • security, privacy, compliance, and Responsible AI
  • security screening requirements

Other signals

  • large OpenAI generative models
  • model-serving platform
  • billions of requests per day
  • high throughput/low latency