Research Scientist, ML Efficiency, Google Research

Google Google · Big Tech · Singapore

Research Scientist focused on improving the computational efficiency of large-scale Generative AI Models (LLMs, Diffusion Models, Generative Videos) through algorithmic research, model compression, and inference acceleration. The role involves advancing algorithms for serving and inference, innovating training architectures, optimizing deployment pipelines, and collaborating with hardware/software teams. A PhD and publication record are required.

What you'd actually do

  1. Advance algorithms, sampling techniques and large-scale optimization to make serving and inference of generative AI models more efficient and flexible.This includes model compression, knowledge distillation and quantization strategies.
  2. Innovate algorithms and large language model architectures that improve computation efficiency and generalization of training deep learning models.
  3. Improve the end-to-end model deployment pipeline that includes entirely new formulations of pretraining, instruction tuning, reinforcement learning, thinking and reasoning.
  4. Collaborate with hardware and software teams to optimize kernels and inference engines, across different hardware and model architectures.
  5. Optimize latency, memory bandwidth, workloads.

Skills

Required

  • PhD degree in Computer Science, a related field, or equivalent practical experience
  • One of more scientific publication submissions for conferences, journals, or public repositories (such as CVPR, ICCV, NeurIPS, ICML, ICLR, etc.)

Nice to have

  • Experience in a university or industry labs, with primary emphasis on AI research
  • Understand of transformer architecture internals
  • Ability to drive new research ideas from problem abstraction, designing solution, experimentation, to productionisation in a rapidly shifting landscape
  • Excellent technical leadership and communication skills to conduct multi-team cross-function collaborations
  • Passion for deep/machine learning, computational statistics, and applied mathematics

What the JD emphasized

  • Computational Efficiency of large-scale Generative AI Models
  • algorithmic efficiency
  • model compression
  • inference acceleration
  • serving and inference
  • model deployment pipeline
  • pretraining
  • instruction tuning
  • reinforcement learning
  • optimize kernels and inference engines
  • optimization
  • publication submissions for conferences, journals, or public repositories

Other signals

  • computational efficiency
  • generative AI models
  • inference acceleration
  • model compression
  • quantization