Staff Software Engineer, On-device Machine Learning Infrastructure

Google Google · Big Tech · Sunnyvale, CA +1

Staff Software Engineer focused on building and optimizing on-device ML infrastructure for Google's flagship products, enabling the deployment of models like Gemini Nano and Gemma across various hardware accelerators on Android, Chrome, and more. The role involves creating roadmaps for developer-facing APIs/SDKs, solving complex performance optimization problems for Generative AI on heterogeneous hardware, designing resilient systems, and coordinating efforts across multiple internal teams.

What you'd actually do

  1. Create roadmaps for developer-facing Application Programming Interfaces (APIs), Software Development Kits (SDKs), and tools, ensuring they meet the evolving needs of Large Language Models (LLMs) workflows.
  2. Solve technically tests problems that exceed the scope of a generalist Software Engineers, specifically around optimizing Generative AI performance across heterogeneous hardware (CPUs, GPUs, and EdgeTPUs).
  3. Guide the team in designing resilient and robust systems, proactively anticipating scaling bottlenecks or shifts in usage as LLMs become increasingly complex.
  4. Coordinate efforts across multiple groups, including Android ML, ML Compiler, and DeepMind, to co-design performance and evaluation workflows.
  5. Provide technical mentorship, and implement new practices that address team needs and increase the velocity of your teammates.

Skills

Required

  • software development
  • software design and architecture
  • ML infrastructure
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning
  • Speech/audio
  • reinforcement learning
  • ML design

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field
  • data structures and algorithms
  • complex organization involving cross-functional, or cross-business projects
  • ML converters/compilers and runtimes
  • hardware-accelerated ML inference techniques
  • Generative AI model architectures
  • optimization for on-device execution

What the JD emphasized

  • on-device deployment
  • optimizing Generative AI performance
  • heterogeneous hardware
  • ML infrastructure
  • on-device ML infrastructure

Other signals

  • on-device ML infrastructure
  • performance optimization
  • deployment across accelerators
  • leading ML projects on-device