Principal Software Engineer, Coreai

Microsoft Microsoft · Big Tech · Redmond, WA +3 · Software Engineering

This role focuses on building and optimizing the AI infrastructure for training agentic AI systems, including LLMs and SLMs, to achieve frontier-level performance. It involves developing scalable infrastructure and services for training, deploying, and monitoring these models in a cloud environment.

What you'd actually do

  1. Collaboration with engineers and researchers to build and optimize training infrastructure and tools for LLMs, SLMs, multimodal, and code-specific models.
  2. Design, build and improve services with high scalability and reliability.
  3. Design and implement the services to serve the prod traffic and fulfill the security and privacy requirements.
  4. Participate in efforts to deliver and improve engineering systems and practices to ensure service quality in complex cloud environments.
  5. Contribute to the deployment and monitoring of services in production environments.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field
  • 6+ years technical engineering experience
  • coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python
  • Ability to meet Microsoft, customer and/or government security screening requirements
  • Microsoft Cloud Background Check

Nice to have

  • Experience working with engineering teams to deliver large-scale software systems, preferably in AI, machine learning, graphics or related fields.
  • Thrive in a fast-paced, collaborative environment and are comfortable making progress in ambiguity.
  • Enjoy working closely with cross-functional partners and teammates in an inclusive, curious culture.
  • Have strong opinions about best investments to make in establishing the most delightful and performant AI companion engineering system.

What the JD emphasized

  • lead and role model
  • track record of continuous improvement
  • agile, startup-style mindset
  • iterate quickly
  • pivot when needed
  • collaborate effectively
  • fast-paced, dynamic environments
  • security and privacy requirements

Other signals

  • training infrastructure
  • agentic AI systems
  • frontier-level performance
  • LLMs, SLMs, and agentic models