Software Engineering Manager, Cloud ML Compute Services (mandarin, English)

Google Google · Big Tech · Singapore

Software Engineering Manager for Google Cloud's ML Compute Services, focusing on optimizing customer AI/ML models on Google Cloud infrastructure. The role involves leading a team, providing technical guidance, partnering with customers on performance, and collaborating with internal teams to enhance AI workload support.

What you'd actually do

  1. Recruit, provide mentorship and technical guidance to the team including road map and direction for team deliverables as well as efficient execution.
  2. Partner with customers to optimize the performance of their AI/ML models on Google Cloud infrastructure. Lead performance profiling, debugging, and troubleshooting of customer training and inference workloads.
  3. Collaborate with internal infrastructure, ML teams to improve Google Cloud's ability to support AI workloads.
  4. Develop and deliver training materials and demos to empower customers and internal teams.
  5. Contribute to the continuous improvement of our products by identifying and reporting bugs and suggesting enhancements. Proactively identify and address technical bottlenecks hindering customer success.

Skills

Required

  • software development
  • ML design
  • ML infrastructure optimization
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning
  • technical leadership
  • people management
  • team leadership
  • Mandarin
  • English

Nice to have

  • complex, matrixed organization
  • cross-functional projects
  • cross-business projects
  • customer collaboration
  • field teams collaboration
  • internal quality and repro testing
  • AI/ML
  • Infrastructure technologies
  • product improvement
  • bug fixes
  • feature enhancements

What the JD emphasized

  • technical leadership
  • optimize the performance of their AI/ML models
  • customer training and inference workloads
  • AI/ML and Infrastructure technologies

Other signals

  • AI/ML infrastructure
  • customer AI/ML models
  • training and inference workloads
  • Google Cloud