Principal Software Engineer, Experimentation Platform - Coreai

Microsoft Microsoft · Big Tech · Redmond, WA +2 · Software Engineering

This role is for a Principal Software Engineer on the Experimentation Platform team within Microsoft's CoreAI division. The platform enables high-scale online experimentation for AI systems and product features, accelerating product learning and driving progress across Microsoft's AI ecosystem. The engineer will lead architecture and development of this critical infrastructure, focusing on systems that help teams ship better AI experiences faster through rigorous experimentation and responsible AI practices.

What you'd actually do

  1. Champion and improve AI tools and practices across the software development lifecycle (SDLC), incorporating appropriate controls over AI-generated assets.
  2. Lead by example across teams to produce extensible, maintainable, well-tested, secure, and performant code; identify and establish coding best practices, create and apply metrics to drive code quality and stability, and mentor engineers to continuously raise the engineering bar.
  3. Own and lead the architecture of complex product solutions, driving design discussions, evaluating new technologies to solve problems, and ensuring system architecture meets performance, scalability, resiliency and disaster recovery requirements.
  4. Lead cross-team collaboration to identify dependencies, negotiate delivery schedules, drive alignment across partner teams, and ensure proper end-to-end testing, live-site coverage, scalability and performance before going live.
  5. Drive engineering excellence across products; lead efforts targeting zero-touch deployment, production reliability, and security hardening for both protections and detections.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements.

Nice to have

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Extensive experience architecting and operating large-scale distributed systems on cloud platforms (Azure, AWS, GCP), with demonstrated ownership of critical production infrastructure serving millions of users.
  • Track record of designing highly scalable, resilient service architectures with strong emphasis on fault tolerance, disaster recovery, and cost optimization at scale.
  • Deep experience using observability tools (logging, metrics, distributed tracing) to diagnose complex cross-service issues and drive systemic reliability improvements across multiple products.
  • Proven experience mentoring senior engineers, driving technical direction, conducting design reviews, and raising the engineering bar across teams.
  • Experience with experimentation platforms, A/B testing at scale, and statistical methodologies for measuring product impact and driving data-informed ship decisions.
  • Experience leading security hardening efforts, threat modeling, and incident response processes for production systems.
  • Experience championing AI-assisted development workflows and establishing responsible AI coding practices across engineering teams.

What the JD emphasized

  • high-scale
  • accelerates product learning
  • AI systems
  • rigorous experimentation
  • responsible AI
  • large-scale distributed systems
  • critical production infrastructure
  • highly scalable, resilient service architectures
  • experimentation platforms, A/B testing at scale

Other signals

  • experimentation platform
  • AI systems
  • product learning
  • high-scale online experimentation
  • accelerates product learning
  • AI ecosystem
  • AI capabilities
  • responsible AI