Customer Engineer, AI Infrastructure Mo… at Google

What you'd actually do

Become a trusted advisor to the top customers, helping them understand and incorporate AI accelerators into their overall cloud and IT strategy by designing training and inferencing platforms, using the accelerators Google Cloud has to offer.

Demonstrate how Google Cloud is differentiated, highlighting the power of accelerators by working with customers on POCs, demonstrating features, optimizing model performance, profiling, and bench marking.

Design and implement complex, multi-host AI training and inferencing solutions on Google Cloud TPUs, focusing on scalability and performance tuning.

Conduct performance profiling and optimization of customer models and data pipelines for the TPU architecture, identifying and resolving bottlenecks.

Advise customers on best practices for integrating their MLOps workflows with the Google Cloud AI Platform ecosystem for TPU utilization.

Skills

Required

cloud native architectures
modern cloud infrastructure
networking (switching/routing for ethernet/RoCE/infiniband)
customer-facing or support roles
developing and deploying models using deep learning frameworks (TensorFlow, PyTorch, or JAX)

Nice to have

AI Infrastructure systems
DPU, RoCE, InfiniBand
cooling
accelerators, GPUs and TPUs
AI and software stacks and platforms
AI infrastructure market knowledge

When leading companies choose Google Cloud, it's a huge win for spreading the power of cloud computing globally. Once educational institutions, government agencies, and other businesses sign on to use Google Cloud products, you come in to facilitate making their work more productive, mobile, and collaborative. You listen and deliver what is most helpful for the customer. You assist fellow sales Googlers by problem-solving key technical issues for our customers. You liaise with the product marketing management and engineering teams to stay on top of industry trends and devise enhancements to Google Cloud products.

In this role, you will understand the needs of our customers and help shape the future using AI technology. You will work with Google Cloud Platform's technology and complete AI stack and position the same to our customers in all verticals. You will support Google Cloud sales teams to pilot, and deploy Google Cloud’s industry leading AI/ML accelerators (TPU/GPU) at AI innovators, large enterprises, and early stage AI startups. You will help customers innovate with solutions using Google Cloud’s flexible and open AI infrastructure.

You will be working with Google customers on AI Infrastructure server and networking infrastructure deployments. You will guide customer discussions on network topologies, compute/storage and support bring up of server/network/cluster/cooling deployments. You will need to visit the customer data center during the bring up phase. You will serve as a technical expert on the Google Cloud AI infrastructure, specifically guiding customers through the architecture, deployment, and optimization of large-scale, cost-efficient training and inference jobs on Cloud TPUs.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Responsibilities

Become a trusted advisor to the top customers, helping them understand and incorporate AI accelerators into their overall cloud and IT strategy by designing training and inferencing platforms, using the accelerators Google Cloud has to offer.
Demonstrate how Google Cloud is differentiated, highlighting the power of accelerators by working with customers on POCs, demonstrating features, optimizing model performance, profiling, and bench marking.
Design and implement complex, multi-host AI training and inferencing solutions on Google Cloud TPUs, focusing on scalability and performance tuning.
Conduct performance profiling and optimization of customer models and data pipelines for the TPU architecture, identifying and resolving bottlenecks.
Advise customers on best practices for integrating their MLOps workflows with the Google Cloud AI Platform ecosystem for TPU utilization.

Qualifications

Minimum qualifications:

Bachelor's degree in Computer Science, Mathematics, a related technical field, or equivalent practical experience.
10 years of experience with cloud native architectures and modern cloud infrastructure with networking - switching/routing for ethernet/RoCE/infiniband, in customer-facing or support roles.
Experience developing and deploying models using deep learning frameworks (TensorFlow, PyTorch, or JAX).

Preferred qualifications:

Master's degree in Computer Science, Mathematics, a related technical field.
Experience as an IT infrastructure consultant or enterprise architect working in data center investment strategies and proposals.
Experience with AI Infrastructure systems, networking technologies (e.g., DPU, RoCE, InfiniBand), cooling, and accelerators, GPUs and TPUs.
Experience in leveraging main AI and software stacks and platforms to bring up and deploy AI compute clusters.
Knowledge of the AI infrastructure market, including main technology providers, differentiators and trends.
Ability to work and grow in fluid environments.

Responsibilities

Become a trusted advisor to the top customers, helping them understand and incorporate AI accelerators into their overall cloud and IT strategy by designing training and inferencing platforms, using the accelerators Google Cloud has to offer.
Demonstrate how Google Cloud is differentiated, highlighting the power of accelerators by working with customers on POCs, demonstrating features, optimizing model performance, profiling, and bench marking.
Design and implement complex, multi-host AI training and inferencing solutions on Google Cloud TPUs, focusing on scalability and performance tuning.
Conduct performance profiling and optimization of customer models and data pipelines for the TPU architecture, identifying and resolving bottlenecks.
Advise customers on best practices for integrating their MLOps workflows with the Google Cloud AI Platform ecosystem for TPU utilization.

Qualifications

Minimum qualifications:

Bachelor's degree in Computer Science, Mathematics, a related technical field, or equivalent practical experience.
10 years of experience with cloud native architectures and modern cloud infrastructure with networking - switching/routing for ethernet/RoCE/infiniband, in customer-facing or support roles.
Experience developing and deploying models using deep learning frameworks (TensorFlow, PyTorch, or JAX).

Preferred qualifications:

Master's degree in Computer Science, Mathematics, a related technical field.
Experience as an IT infrastructure consultant or enterprise architect working in data center investment strategies and proposals.
Experience with AI Infrastructure systems, networking technologies (e.g., DPU, RoCE, InfiniBand), cooling, and accelerators, GPUs and TPUs.
Experience in leveraging main AI and software stacks and platforms to bring up and deploy AI compute clusters.
Knowledge of the AI infrastructure market, including main technology providers, differentiators and trends.
Ability to work and grow in fluid environments.

Customer Engineer, AI Infrastructure Modernization Tpu, Google Cloud

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications:

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications: