Senior Customer Engineer, AI Infrastructure, Google Cloud

Google Google · Big Tech · Singapore

Senior Customer Engineer focused on AI infrastructure, specifically Google Cloud TPUs, for enterprise clients. This role involves designing, deploying, and optimizing AI training and inferencing solutions, advising on ML operations, and supporting sales teams by solving technical challenges related to AI hardware and software stacks.

What you'd actually do

  1. Design and implement complex, multi-host AI training and inferencing solutions on Google Cloud TPUs, focusing on scalability and performance tuning.
  2. Conduct in-depth performance profiling and optimization of customer models and data pipelines specifically for the TPU architecture, identifying and resolving bottlenecks.
  3. Advise customers on best practices for integrating their ML operations workflows with the Google Cloud AI platform ecosystem for seamless TPU utilization.

Skills

Required

  • Deep learning frameworks (TensorFlow, PyTorch, JAX)
  • TPU hardware optimization
  • Distributed AI training
  • Networking principles for AI
  • Customer-facing technical support

Nice to have

  • Custom kernel development
  • XLA compiler familiarity
  • AI hardware and software stacks
  • AI infrastructure market knowledge

What the JD emphasized

  • 10 years of experience in developing and deploying models using deep learning frameworks (e.g., TensorFlow, PyTorch, or JAX) specifically on TPU hardware.
  • Experience in networking principles, including concepts like collective communication, inter-chip interconnects, and their impact on distributed AI training.
  • Experience with leveraging AI hardware and software stacks and platforms to bring up and deploy AI compute clusters.

Other signals

  • Customer-facing role
  • AI infrastructure
  • TPU/GPU optimization
  • Distributed training/inference