Staff Research Scientist, ML Efficiency, Google Research

Google Google · Big Tech · Singapore

Research Scientist focused on improving the computational efficiency of large-scale generative AI models (LLMs, Diffusion Models, Generative Videos) through advanced algorithms, model compression, quantization, and optimization of training and inference pipelines. Collaborates with hardware and software teams to optimize kernels and inference engines.

What you'd actually do

  1. Advance algorithms, sampling techniques and large-scale optimization to make serving and inference of generative AI models more efficient and flexible.This includes model compression, knowledge distillation and quantization strategies.
  2. Innovate algorithms and large language model architectures that improve computation efficiency and generalization of training deep learning models.
  3. Improve the end-to-end model deployment pipeline that includes entirely new formulations of pretraining, instruction tuning, reinforcement learning, thinking and reasoning.
  4. Collaborate with hardware and software teams to optimize kernels and inference engines, across different hardware and model architectures.
  5. Optimize latency, memory bandwidth, workloads.

Skills

Required

  • PhD degree in Computer Science, a related field, or equivalent practical experience.
  • 4 years of experience in a university or industry labs, with Artificial Intelligence (AI) research.
  • One of more scientific publication submissions for conferences, journals, or public repositories (such as CVPR, ICCV, NeurIPS, ICML, ICLR, etc.).

Nice to have

  • Experience with deep/machine learning, computational statistics, and applied mathematics.
  • Knowledge of transformer architecture internals.
  • Ability to drive new research ideas from problem abstraction, designing solutions, experimentation, to productionisation in a rapidly shifting landscape.
  • Excellent technical leadership and communication skills to conduct multi-team cross-function collaborations.

What the JD emphasized

  • Computational Efficiency of large-scale Generative AI Models
  • serving and inference
  • model compression
  • knowledge distillation
  • quantization strategies
  • training deep learning models
  • pretraining
  • instruction tuning
  • reinforcement learning
  • optimize kernels
  • inference engines

Other signals

  • Computational Efficiency of large-scale Generative AI Models
  • model compression, knowledge distillation and quantization strategies
  • optimize kernels and inference engines