Senior Software Engineer, Machine Learning, Core ML

Google Google · Big Tech · Mountain View, CA +1

Senior Software Engineer on the RecML team, focused on scaling machine learning for recommendations and user modeling. The role involves architecting and implementing model-parallel training, optimizing transformer models, and writing low-level code for performance. This is a horizontal ML infra and efficiency role supporting the training framework of foundation recommender models.

What you'd actually do

  1. Architect and implement the transition from data-parallel to model-parallel training paradigms.
  2. Design and manage large-scale training runs across multi-pod environments, maximizing data center network bandwidth and minimizing communication bottlenecks.
  3. Research and integrate transformer model optimizations and novel architectural variants to reduce training time and resource consumption.
  4. Write and optimize low-level model code, including custom pallas kernels, to maximize performance out of the hardware.
  5. Work cross-functionally with the team and the Kernel optimization team to co-design and implement compiler-level optimizations that accelerate model execution.

Skills

Required

  • Python
  • C++
  • ML infrastructure
  • model deployment
  • model evaluation
  • optimization
  • data processing
  • debugging

Nice to have

  • JAX
  • PyTorch
  • TensorFlow
  • core internals of deep learning frameworks
  • hardware-aware optimizations
  • machine learning compilers
  • XLA
  • MLIR
  • Large Language Models (LLMs)
  • foundation models
  • model-parallel configurations
  • tensor-parallel configurations
  • pipeline-parallel configurations

What the JD emphasized

  • model-parallel training
  • transformer model optimizations
  • custom pallas kernels
  • compiler-level optimizations
  • scaling machine learning models
  • foundation models
  • model deployment
  • model evaluation
  • optimization
  • data processing
  • debugging

Other signals

  • scaling ML models
  • foundation models
  • model-parallel training
  • transformer optimizations
  • custom kernels
  • compiler-level optimizations