What you'd actually do

Design and implement complex, multi-host AI training and inferencing solutions on Google Cloud TPUs, focusing on scalability and performance tuning.

Conduct in-depth performance profiling and optimization of customer models and data pipelines specifically for the TPU architecture, identifying and resolving bottlenecks.

Advise customers on best practices for integrating their ML operations workflows with the Google Cloud AI platform ecosystem for seamless TPU utilization.

Skills

Required

Deep learning frameworks (TensorFlow, PyTorch, JAX)
TPU hardware optimization
Networking principles for distributed AI
Performance profiling and optimization
Customer consultation

Nice to have

Custom kernel development
XLA compiler familiarity
AI hardware and software stacks
AI infrastructure market knowledge

When leading companies choose Google Cloud, it's a huge win for spreading the power of cloud computing globally. Once educational institutions, government agencies, and other businesses sign on to use Google Cloud products, you come in to facilitate making their work more productive, mobile, and collaborative. You listen and deliver what is most helpful for the customer. You assist fellow sales Googlers by problem-solving key technical issues for our customers. You liaise with the product marketing management and engineering teams to stay on top of industry trends and devise enhancements to Google Cloud products.

As a Customer Engineer, you will understand the needs of our customers and help shape the future using AI technology. You will work with Google Cloud Platform's (GCP) technology and complete AI stack, and position the same to our customers in all verticals.

In this role, you will support Google Cloud Sales teams to deploy AI/ML accelerators (e.g., TPU/GPU) at AI innovators, large enterprises, and early-stage AI startups. You will help customers innovate faster with solutions using Google Cloud’s flexible and open AI infrastructure.

You will work with Google customers on AI Infrastructure server and networking deployments. You will guide customer discussions on network topologies and compute/storage, and support bring-up of the server, network, cluster, or cooling deployments as it will include visits to the customer data center during the bring up phase.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Responsibilities

Design and implement complex, multi-host AI training and inferencing solutions on Google Cloud TPUs, focusing on scalability and performance tuning.
Conduct in-depth performance profiling and optimization of customer models and data pipelines specifically for the TPU architecture, identifying and resolving bottlenecks.
Advise customers on best practices for integrating their ML operations workflows with the Google Cloud AI platform ecosystem for seamless TPU utilization.

Qualifications

Minimum qualifications:

Bachelor's degree or equivalent practical experience.
10 years of experience in developing and deploying models using deep learning frameworks (e.g., TensorFlow, PyTorch, or JAX) specifically on TPU hardware.
Experience in networking principles, including collective communication, inter-chip interconnects, and distributed AI training.

Preferred qualifications:

Experience with lower-level performance tools and techniques (e.g., custom kernel development, XLA compiler familiarity) relevant to optimizing code for Google's TPU chips.
Experience with leveraging AI hardware and software stacks and platforms to bring up and deploy AI compute clusters.
Knowledge of AI accelerator hardware (e.g., specific GPU generations) to effectively articulate the architectural differentiation and value proposition of cloud TPUs.
Knowledge of the AI infrastructure market, including main technology providers, differentiators, and trends.

Responsibilities

Design and implement complex, multi-host AI training and inferencing solutions on Google Cloud TPUs, focusing on scalability and performance tuning.
Conduct in-depth performance profiling and optimization of customer models and data pipelines specifically for the TPU architecture, identifying and resolving bottlenecks.
Advise customers on best practices for integrating their ML operations workflows with the Google Cloud AI platform ecosystem for seamless TPU utilization.

Qualifications

Minimum qualifications:

Bachelor's degree or equivalent practical experience.
10 years of experience in developing and deploying models using deep learning frameworks (e.g., TensorFlow, PyTorch, or JAX) specifically on TPU hardware.
Experience in networking principles, including collective communication, inter-chip interconnects, and distributed AI training.

Preferred qualifications:

Experience with lower-level performance tools and techniques (e.g., custom kernel development, XLA compiler familiarity) relevant to optimizing code for Google's TPU chips.
Experience with leveraging AI hardware and software stacks and platforms to bring up and deploy AI compute clusters.
Knowledge of AI accelerator hardware (e.g., specific GPU generations) to effectively articulate the architectural differentiation and value proposition of cloud TPUs.
Knowledge of the AI infrastructure market, including main technology providers, differentiators, and trends.

Customer Engineer, AI Infrastructure, Google Cloud

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications:

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications: