Senior Research Scientist, ML Efficiency, Google Research

Google Google · Big Tech · Singapore

Research Scientist focused on improving the computational efficiency of generative AI models (LLMs, Diffusion Models, Generative Videos) through foundational research in algorithmic efficiency, model compression, and inference acceleration. This role involves innovating algorithms, optimizing model architectures, improving the deployment pipeline (pretraining, tuning, RL), and collaborating with hardware/software teams to optimize inference engines and reduce latency/memory usage.

What you'd actually do

  1. Advance in algorithms, sampling techniques and optimization to make serving and inference of generative AI models more efficient and flexible.This includes model compression, knowledge distillation and quantization strategies.
  2. Innovate algorithms and large language model architectures that improve computation efficiency and generalization of training learning models.
  3. Improve the model deployment pipeline that includes entirely new formulations of pretraining, instruction tuning, reinforcement learning, thinking and reasoning.
  4. Collaborate with Hardware and Software teams to optimize kernels and inference engines, across different hardware and model architectures.
  5. Optimize latency, memory bandwidth, and workloads.

Skills

Required

  • PhD degree in Computer Science, a related field, or equivalent practical experience
  • 2 years of experience leading a research agenda
  • One or more scientific publication submissions for conferences, journals, or public repositories (such as CVPR, ICCV, NeurIPS, ICML, ICLR, etc.)

Nice to have

  • 5 years of experience in driving new research ideas from problem abstraction, designing solutions, experimentation, to productionization in a rapidly shifting landscape
  • Understanding of transformer architecture internals
  • Passion for deep/machine learning, computational statistics, and applied mathematics
  • Excellent technical leadership and communication skills to conduct multi-team cross-functional collaborations

What the JD emphasized

  • Computational Efficiency of Generative AI Models
  • algorithmic efficiency
  • model compression
  • inference acceleration
  • serving and inference of generative AI models more efficient
  • model compression, knowledge distillation and quantization strategies
  • computation efficiency and generalization of training learning models
  • pretraining, instruction tuning, reinforcement learning
  • optimize kernels and inference engines
  • Optimize latency, memory bandwidth, and workloads
  • One or more scientific publication submissions for conferences, journals, or public repositories (such as CVPR, ICCV, NeurIPS, ICML, ICLR, etc.)

Other signals

  • Computational Efficiency of Generative AI Models
  • algorithmic efficiency
  • model compression
  • inference acceleration
  • serving and inference of generative AI models more efficient
  • model compression, knowledge distillation and quantization strategies
  • computation efficiency and generalization of training learning models
  • pretraining, instruction tuning, reinforcement learning
  • optimize kernels and inference engines
  • Optimize latency, memory bandwidth, and workloads