Principal Applied AI Engineering Manager

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Business Applications

This role leads an AI engineering pod focused on building and shipping production agentic AI services for Azure customer support. Responsibilities include hiring and coaching engineers, driving the end-to-end lifecycle of AI agents (LLM orchestration, multi-agent workflows, RAG, evaluation), setting technical direction, ensuring production health and reliability, building evaluation systems, influencing roadmaps, driving business impact, ensuring responsible AI, and developing engineering talent.

What you'd actually do

  1. Build and lead the AI engineering pod — Hire, coach, and develop a team of Applied AI Engineers; foster an inclusive, high-trust culture where engineers ship production AI services with ownership and velocity.
  2. Own engineering execution for agentic AI services — Drive the end-to-end lifecycle of production AI agents from spec to deployment, including LLM orchestration, multi-agent workflows, RAG pipelines, and evaluation systems.
  3. Set technical direction and engineering standards — Define architecture patterns, code quality bar, evaluation frameworks, deployment practices, and observability standards for the AI engineering pod. Ensure production-quality C# and Python with TDD, CI/CD, staged rollouts, and full observability.
  4. Own production health and reliability — Ensure deployed AI agents meet quality, performance, and safety standards. Drive incident response, root cause analysis, and continuous improvement for agent systems in production.
  5. Build and maintain evaluation systems — Establish evaluation frameworks including rubrics, golden datasets, and judge agents to validate agent correctness and safety before and after production deployment. Ensure agents graduate through shadow mode to autonomous operation with eval gates at each stage.

Skills

Required

  • Python
  • C#
  • LLM-based systems
  • prompt engineering
  • RAG architectures
  • agent frameworks
  • cloud platforms (Azure)
  • cloud-native service development
  • microservices
  • containers
  • CI/CD
  • people management
  • team leadership

Nice to have

  • low-code application development
  • engineering product/technical program management
  • data analysis
  • product development
  • Dataverse
  • Power Applications
  • managing and configuring artificial intelligence solutions
  • chatbots

What the JD emphasized

  • production AI services
  • agentic AI services
  • LLM orchestration
  • multi-agent coordination
  • RAG-based systems
  • evaluation frameworks
  • production health and reliability
  • responsible AI

Other signals

  • shipping production agentic AI services
  • LLM orchestration
  • multi-agent coordination
  • RAG-based systems
  • evaluation frameworks