Principal Machine Learning Engineer

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

Principal Machine Learning Engineer for Health Futures team focused on accelerating training of generative models, advancing model capabilities, and optimizing training/evaluation/inference pipelines for health and life sciences applications.

What you'd actually do

  1. Lead the design and development of machine learning models and systems for health and life sciences applications, ensuring scalability and reliability.
  2. Define technical strategy and architecture for ML pipelines, including data ingestion, feature engineering, model training, evaluation, and deployment.
  3. Collaborate with interdisciplinary teams (including scientists, researchers, and software engineers) to envision and develop AI-augmented scientific systems.
  4. Mentor engineers and researchers, promoting best practices in ML development, experimentation, and responsible AI principles.
  5. Ensure security, privacy, and regulatory compliance across ML workflows and data handling.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field
  • 6+ years technical engineering experience with coding in languages including, but not limited to, C++, C#, Java, JavaScript, or Python

Nice to have

  • Masters in Computer Science or related technical field
  • 6+ years technical engineering experience including significant work in machine learning or applied AI
  • Proven track record of designing and deploying large-scale ML or MLops systems in research or product settings
  • Hands-on experience with large-scale distributed training of ML models
  • Deep expertise in ML algorithms, model optimization, and frameworks (e.g., PyTorch, TensorFlow)
  • Experience with one or more of: optimizing data mixes, mid-training, post-training, model merging, or model distillation
  • Familiarity with security and compliance standards for enterprise and health data
  • Demonstrated ability to communicate effectively and solve problems in collaborative, research-driven environment

What the JD emphasized

  • large-scale distributed training
  • large-scale ML or MLops systems
  • security, privacy, and regulatory compliance

Other signals

  • training generative models
  • advancing state-of-the-art model capabilities
  • full spectrum of model development
  • high-performance inferencing