Software Engineer Iii, Core ML Performance

Google Google · Big Tech · Sunnyvale, CA +1

Software Engineer III on the Core ML Frameworks team at Google, focusing on developing and optimizing high-performance custom kernels for ML operations on TPU and GPU architectures. This role involves building infrastructure for kernel development, performance analysis, and supporting new hardware and ML operations, contributing to Google's ML stack and GCP.

What you'd actually do

  1. Contribute to the development and maintenance of Tokamax, a unified open-source kernel library, creating a home for high-quality, well-tested, easy-to-use, and performant kernels available to both internal and external users.
  2. Build infrastructure and tooling for kernel development, including bench-marking suites, auto-tuning frameworks, performance analysis tools, debugging tools, and continuous integration pipelines to ensure the correctness and performance of custom kernels across different hardware and model configurations.
  3. Design, develop, and optimize high-performance custom kernels (using languages like Pallas, Mosaic, and Triton) aiming TPU and GPU architectures for key machine learning operations.
  4. Investigate and implement custom kernel support for new accelerator hardware generations/features and emerging ML operations.
  5. Contribute to the documentation and usability of kernel libraries tools and libraries to lower the barrier to entry for researchers and engineers looking to write or leverage custom kernels.

Skills

Required

  • software development
  • performance analysis
  • large-scale systems data analysis
  • visualization tools
  • debugging
  • computer architecture
  • performance modeling

Nice to have

  • data structures
  • algorithms
  • developing accessible technologies
  • Pallas
  • Mosaic
  • Triton
  • TPU
  • GPU

What the JD emphasized

  • high-quality
  • well-tested
  • easy-to-use
  • performant kernels
  • performance analysis tools
  • correctness and performance
  • high-performance custom kernels
  • performance modeling

Other signals

  • ML infrastructure
  • performance optimization
  • custom kernels
  • TPU/GPU architectures