What you'd actually do

Architect and implement the transition from data-parallel to model-parallel training paradigms.

Design and manage large-scale training runs across multi-pod environments, maximizing data center network bandwidth and minimizing communication bottlenecks.

Research and integrate transformer model optimizations and novel architectural variants to reduce training time and resource consumption.

Write and optimize low-level model code, including custom pallas kernels, to maximize performance out of the hardware.

Work cross-functionally with the team and the Kernel optimization team to co-design and implement compiler-level optimizations that accelerate model execution.

Skills

Required

Python
C++
ML infrastructure
model deployment
model evaluation
optimization
data processing
debugging

Nice to have

JAX
PyTorch
TensorFlow
core internals of deep learning frameworks
hardware-aware optimizations
machine learning compilers
XLA
MLIR
Large Language Models (LLMs)
foundation models
model-parallel configurations
tensor-parallel configurations
pipeline-parallel configurations

We are the RecML team in Core ML's Applied ML organization. Our mission is to accelerate product innovations through ML for recommendations and user modeling. We deeply engage with Alphabet products areas and partner with them to help accelerate product innovations through applied research in recommendations and user modeling. We generalize successful innovations into standardized, maintainable, and production-grade solutions for use by other teams and products. This opportunity is a horizontal ML infra and efficiency role supporting the training framework of our foundation recommender model and its customers.

The US base salary range for this full-time position is $174,000-$252,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

Architect and implement the transition from data-parallel to model-parallel training paradigms.
Design and manage large-scale training runs across multi-pod environments, maximizing data center network bandwidth and minimizing communication bottlenecks.
Research and integrate transformer model optimizations and novel architectural variants to reduce training time and resource consumption.
Write and optimize low-level model code, including custom pallas kernels, to maximize performance out of the hardware.
Work cross-functionally with the team and the Kernel optimization team to co-design and implement compiler-level optimizations that accelerate model execution.

Qualifications

Minimum qualifications:

Bachelor’s degree or equivalent practical experience.
5 years of experience programming in Python or C++.
3 years of experience with ML infrastructure (e.g., model deployment, model evaluation, optimization, data processing, debugging).

Preferred qualifications:

Master’s degree or PhD in Computer Science, Machine Learning, Computer Engineering, or a related technical field.
Experience scaling machine learning models (e.g., Large Language Models (LLMs) or foundation models), managing the complexities of transitioning architectures from data-parallel to model, tensor, pipeline-parallel configurations, or related fields.
Experience with deep learning frameworks (e.g., JAX, PyTorch, or TensorFlow), including a track record of contributing to or modifying their core internals to support novel and emerging use cases.
Experience with co-designing hardware-aware optimizations to accelerate model execution.
Knowledge of machine learning compilers (e.g., Accelerated Linear Algebra (XLA) or Multi-Level Intermediate Representation (MLIR)).

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

Architect and implement the transition from data-parallel to model-parallel training paradigms.
Design and manage large-scale training runs across multi-pod environments, maximizing data center network bandwidth and minimizing communication bottlenecks.
Research and integrate transformer model optimizations and novel architectural variants to reduce training time and resource consumption.
Write and optimize low-level model code, including custom pallas kernels, to maximize performance out of the hardware.
Work cross-functionally with the team and the Kernel optimization team to co-design and implement compiler-level optimizations that accelerate model execution.

Qualifications

Minimum qualifications:

Bachelor’s degree or equivalent practical experience.
5 years of experience programming in Python or C++.
3 years of experience with ML infrastructure (e.g., model deployment, model evaluation, optimization, data processing, debugging).

Preferred qualifications:

Master’s degree or PhD in Computer Science, Machine Learning, Computer Engineering, or a related technical field.
Experience scaling machine learning models (e.g., Large Language Models (LLMs) or foundation models), managing the complexities of transitioning architectures from data-parallel to model, tensor, pipeline-parallel configurations, or related fields.
Experience with deep learning frameworks (e.g., JAX, PyTorch, or TensorFlow), including a track record of contributing to or modifying their core internals to support novel and emerging use cases.
Experience with co-designing hardware-aware optimizations to accelerate model execution.
Knowledge of machine learning compilers (e.g., Accelerated Linear Algebra (XLA) or Multi-Level Intermediate Representation (MLIR)).

Senior Software Engineer, Machine Learning, Core ML

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications:

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications: