What you'd actually do

Advance in algorithms, sampling techniques and optimization to make serving and inference of generative AI models more efficient and flexible.This includes model compression, knowledge distillation and quantization strategies.

Innovate algorithms and large language model architectures that improve computation efficiency and generalization of training learning models.

Improve the model deployment pipeline that includes entirely new formulations of pretraining, instruction tuning, reinforcement learning, thinking and reasoning.

Collaborate with Hardware and Software teams to optimize kernels and inference engines, across different hardware and model architectures.

Optimize latency, memory bandwidth, and workloads.

Skills

Required

PhD degree in Computer Science, a related field, or equivalent practical experience
2 years of experience leading a research agenda
One or more scientific publication submissions for conferences, journals, or public repositories (such as CVPR, ICCV, NeurIPS, ICML, ICLR, etc.)

Nice to have

5 years of experience in driving new research ideas from problem abstraction, designing solutions, experimentation, to productionization in a rapidly shifting landscape
Understanding of transformer architecture internals
Passion for deep/machine learning, computational statistics, and applied mathematics
Excellent technical leadership and communication skills to conduct multi-team cross-functional collaborations

What the JD emphasized

Computational Efficiency of Generative AI Models

algorithmic efficiency

model compression

inference acceleration

serving and inference of generative AI models more efficient

model compression, knowledge distillation and quantization strategies

computation efficiency and generalization of training learning models

pretraining, instruction tuning, reinforcement learning

optimize kernels and inference engines

Optimize latency, memory bandwidth, and workloads

One or more scientific publication submissions for conferences, journals, or public repositories (such as CVPR, ICCV, NeurIPS, ICML, ICLR, etc.)

Other signals

Computational Efficiency of Generative AI Models

algorithmic efficiency

model compression

inference acceleration

serving and inference of generative AI models more efficient

model compression, knowledge distillation and quantization strategies

computation efficiency and generalization of training learning models

pretraining, instruction tuning, reinforcement learning

optimize kernels and inference engines

Optimize latency, memory bandwidth, and workloads

As an organization, Google maintains a portfolio of research projects driven by fundamental research, new product innovation, product contribution and infrastructure goals, while providing individuals and teams the freedom to emphasize specific types of work. As a Research Scientist, you'll setup large-scale tests and deploy promising ideas quickly and broadly, managing deadlines and deliverables while applying the latest theories to develop new and improved products, processes, or technologies. From creating experiments and prototyping implementations to designing new architectures, our research scientists work on real-world problems that span the breadth of computer science, such as machine (and deep) learning, data mining, natural language processing, hardware and software performance analysis, improving compilers for mobile platforms, as well as core search and much more.

As a Research Scientist, you'll also actively contribute to the wider research community by sharing and publishing your findings, with ideas inspired by internal projects as well as from collaborations with research programs at partner universities and technical institutes all over the world.

Google Research Singapore is the very latest addition to the Google Research presence around the globe.

As a Research Scientist, you will be making significant breakthroughs towards Computational Efficiency of Generative AI Models (e.g., LLMs, Diffusion Models, Generative Videos). Through foundational research, you will deliver research on algorithmic efficiency, model compression, and inference acceleration, directly impacting how next-generation AI models will be deployed to billions of people.

Google Research is building the next generation of intelligent systems for all Google products. To achieve this, we’re working on projects that utilize the latest computer science techniques developed by skilled software developers and research scientists. Google Research teams collaborate closely with other teams across Google, maintaining the flexibility and versatility required to adapt new projects and foci that meet the demands of the world's fast-paced business needs.

Responsibilities

Advance in algorithms, sampling techniques and optimization to make serving and inference of generative AI models more efficient and flexible.This includes model compression, knowledge distillation and quantization strategies.
Innovate algorithms and large language model architectures that improve computation efficiency and generalization of training learning models.
Improve the model deployment pipeline that includes entirely new formulations of pretraining, instruction tuning, reinforcement learning, thinking and reasoning.
Collaborate with Hardware and Software teams to optimize kernels and inference engines, across different hardware and model architectures.
Optimize latency, memory bandwidth, and workloads.

Qualifications

Minimum qualifications:

PhD degree in Computer Science, a related field, or equivalent practical experience.
2 years of experience leading a research agenda.
One or more scientific publication submissions for conferences, journals, or public repositories (such as CVPR, ICCV, NeurIPS, ICML, ICLR, etc.).

Preferred qualifications:

5 years of experience in driving new research ideas from problem abstraction, designing solutions, experimentation, to productionization in a rapidly shifting landscape.
Understanding of transformer architecture internals.
Passion for deep/machine learning, computational statistics, and applied mathematics.
Excellent technical leadership and communication skills to conduct multi-team cross-functional collaborations.

Google Research Singapore is the very latest addition to the Google Research presence around the globe.

Responsibilities

Advance in algorithms, sampling techniques and optimization to make serving and inference of generative AI models more efficient and flexible.This includes model compression, knowledge distillation and quantization strategies.
Innovate algorithms and large language model architectures that improve computation efficiency and generalization of training learning models.
Improve the model deployment pipeline that includes entirely new formulations of pretraining, instruction tuning, reinforcement learning, thinking and reasoning.
Collaborate with Hardware and Software teams to optimize kernels and inference engines, across different hardware and model architectures.
Optimize latency, memory bandwidth, and workloads.

Qualifications

Minimum qualifications:

PhD degree in Computer Science, a related field, or equivalent practical experience.
2 years of experience leading a research agenda.
One or more scientific publication submissions for conferences, journals, or public repositories (such as CVPR, ICCV, NeurIPS, ICML, ICLR, etc.).

Preferred qualifications:

5 years of experience in driving new research ideas from problem abstraction, designing solutions, experimentation, to productionization in a rapidly shifting landscape.
Understanding of transformer architecture internals.
Passion for deep/machine learning, computational statistics, and applied mathematics.
Excellent technical leadership and communication skills to conduct multi-team cross-functional collaborations.