Deployment Lead, AI Infrastructure, Google Cloud

Google Google · Big Tech · Sydney NSW, Australia

This role focuses on deploying and managing AI infrastructure for enterprise clients on Google Cloud. While the role requires experience with AI/ML infrastructure (GPUs, GKE, Kubeflow) and potentially tuning AI/ML models, the core responsibilities are around customer engagement, solution architecture, and managing the deployment of cloud-based distributed systems, rather than directly building or researching AI models themselves. The role acts as a bridge between hardware and enterprise software for large-scale computing solutions.

What you'd actually do

  1. Work with customer technical leads, client executives, and partners to manage and deliver successful implementations of cloud solutions becoming a trusted advisor to decision makers throughout the engagement.
  2. Propose solution architectures and manage the deployment of cloud based distributed virtualized infrastructure solutions according to complex customer requirements and implementation best practices.
  3. Work with internal specialists, Product, and Engineering teams to package approaches, best practices, and lessons learned into thought leadership, methodologies, and published assets.
  4. Interact with Business, Partners, and customer technical stakeholders to manage project scope, priorities, deliverables, risks and issues, and timelines for successful client outcomes.

Skills

Required

  • Bachelor’s degree or equivalent practical experience
  • 10 years of experience troubleshooting technical issues for internal/external partners or customers
  • 7 years of experience in customer management and stakeholder engagement for deployments
  • 5 years of experience with Artificial Intelligence/Machine Learning (AI/ML) Infrastructure (e.g., Graphics Processing Units (GPUs), Google Kubernetes Engine (GKE), Kubernetes, Kubeflow)
  • Experience in either system design or reading code (e.g., Java, C++, Python)

Nice to have

  • MBA or Master's degree in Computer Science, Engineering or a related field
  • Experience developing and tuning AI/ML models using frameworks such as JAX, PyTorch, or OpenXLA
  • Experience in one or more of the following areas: Developer Operations (DevOps), Security, Site Reliability Engineering (SRE)
  • Experience in one or more of the following areas: Data Center Infrastructure, Networking, Compute, Storage
  • Customer-facing migration experience, including service discovery, assessment, planning, execution, and operations