Software Engineer 2

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

Software Engineer 2 on the Microsoft Azure AI Inference platform team, responsible for the hosting, optimization, and scaling of the inference stack for Azure AI Foundary models, including those from OpenAI and other OSS providers. The role focuses on designing and implementing core inference infrastructure, improving performance and efficiency for LLMs and GenAI models, and scaling the platform to meet growing demand.

What you'd actually do

  1. Design and implement core inference infrastructure for serving frontier AI models in production.
  2. Identify and drive improvements to end-to-end inference performance and efficiency of state-of-the-art LLMs and GenAI models from OpenAI, Anthropic and xAI hosted on AI Foundary.
  3. Design and implement efficient load scheduling and balancing strategies, by leveraging key insights and features of the model and workload.
  4. Scale the platform to support the growing inferencing demand and maintain high availability.
  5. Deliver critical capabilities required to serve the latest and greatest Gen AI models such as GPT5, Realtime audio, Sora, and enable fast time to market for them.

Skills

Required

  • Bachelor’s degree in Computer Science or a related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Golang, OR equivalent experience.
  • Ability to meet Microsoft, customer, and/or government security screening requirements for this role.

Nice to have

  • Technical background with a solid foundation in software engineering principles, distributed computing, and system architecture.
  • Experience working on high-scale, reliable online systems.
  • Experience with real-time online services requiring low latency and high throughput.
  • Experience working with Layer 7 (L7) network proxies and gateways.
  • Knowledge of network architecture and concepts, including HTTP and TCP protocols, authentication, and session management.
  • Knowledge and experience with OSS, Docker, Kubernetes, C++, Golang, or equivalent programming languages.
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers.
  • Ability to independently lead projects.

What the JD emphasized

  • state-of-the-art LLMs and GenAI models
  • low latency and high throughput
  • latest and greatest Gen AI models

Other signals

  • serving billions of inferences per day
  • high-throughput, low-latency environments
  • scale the platform to support the growing inferencing demand
  • Deliver critical capabilities required to serve the latest and greatest Gen AI models