Customer Engineer, AI Infrastructure, Google Cloud

Google Google · Big Tech · Singapore

Customer Engineer focused on deploying and optimizing AI infrastructure (TPUs/GPUs) for customers on Google Cloud Platform, supporting AI training and inference solutions. This role involves deep technical expertise in AI hardware, distributed systems, and performance tuning for large-scale AI workloads.

What you'd actually do

  1. Design and implement complex, multi-host AI training and inferencing solutions on Google Cloud TPUs, focusing on scalability and performance tuning.
  2. Conduct in-depth performance profiling and optimization of customer models and data pipelines specifically for the TPU architecture, identifying and resolving bottlenecks.
  3. Advise customers on best practices for integrating their ML operations workflows with the Google Cloud AI platform ecosystem for seamless TPU utilization.

Skills

Required

  • Deep learning frameworks (TensorFlow, PyTorch, JAX)
  • TPU hardware optimization
  • Networking principles for distributed AI
  • Performance profiling and optimization
  • Customer consultation

Nice to have

  • Custom kernel development
  • XLA compiler familiarity
  • AI hardware and software stacks
  • AI infrastructure market knowledge

What the JD emphasized

  • 10 years of experience in developing and deploying models using deep learning frameworks (e.g., TensorFlow, PyTorch, or JAX) specifically on TPU hardware.
  • Experience in networking principles, including collective communication, inter-chip interconnects, and distributed AI training.

Other signals

  • Customer-facing role
  • AI infrastructure
  • TPU/GPU optimization
  • Distributed training/inference