Software Engineer Iii, Tensor Processing Units, Ai/ml

Google Google · Big Tech · London, United Kingdom

Software Engineer III role focused on developing and supporting AI/ML frameworks and compilers for Google Cloud's Tensor Processing Units (TPUs) and Google Processing Units (GPUs). The role involves enabling large-scale training and inference of advanced ML models, including generative models, computer vision, NLP, and speech, with a focus on PyTorch. Collaboration with internal teams and external researchers is key to advancing the PyTorch-in-Cloud offering.

What you'd actually do

  1. Work on AI framework development to successfully enable PyTorch models.
  2. Provide comprehensive support for ML frameworks and compilers on Cloud Tensor Processing Units and Google Processing Units, enabling the training and deployment of the most advanced machine learning models, driving innovation and breakthroughs.
  3. Enable the PyTorch models at scale for generative models, computer vision (image recognition, object detection, image generation), machine translation, language modeling, rankings and recommendations, speech recognition, etc.
  4. Collaborate with other Google teams and leading researchers across the industry to continuously bring ML capabilities to our PyTorch-in-Cloud offering.
  5. Design, develop, test, deploy, maintain, and improve software.

Skills

Required

  • software development in C++ and Python
  • machine learning infrastructure
  • Speech/audio
  • reinforcement learning

Nice to have

  • Master's degree or PhD in Computer Science or related technical fields
  • data structures and algorithms
  • large-scale distributed systems and system design
  • developing accessible technologies

What the JD emphasized

  • machine learning infrastructure (e.g., model deployment, model evaluation, data processing, debugging)
  • Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning, or specialization in another machine learning field.

Other signals

  • AI framework development
  • ML frameworks and compilers
  • training and inference workloads
  • PyTorch models at scale