Principal Model Optimization Engineer

Roblox Roblox · Consumer · San Mateo, CA · Software Engineering

Roblox is seeking a Principal Model Optimization Engineer to optimize machine learning models for performance on GPU architectures, focusing on both training and inference workflows. This role involves low-level performance profiling, contributing to best practices and tooling, and collaborating with cross-functional teams to integrate optimized models into production. The ideal candidate has significant experience debugging GPUs and is proficient in advanced tools like CUDA, Triton, and TensorRT, with expertise in LLM optimization techniques such as speculative decoding and quantization.

What you'd actually do

  1. Optimize machine learning models for performance on GPU architectures, focusing on both training and inference workflows.
  2. Conduct low-level performance profiling analysis to identify bottlenecks in existing machine learning pipelines and propose actionable improvements.
  3. Contribute to the development of best practices and tooling for model optimization and deployment.
  4. Collaborate with cross-functional teams, including data scientists and software engineers, to integrate and deploy optimized models into production environments.
  5. Partner across organizations to build tooling, interfaces, and visualizations that make the ML@Roblox a delight to use.

Skills

Required

  • 6+ years of professional experience
  • system design experience
  • debugging GPUs
  • CUDA
  • Triton
  • TensorRT
  • model optimization techniques for LLMs
  • speculative decoding
  • continuous batching
  • quantization

Nice to have

  • Bachelor's degree in Computer Science, Computer Engineering, Data Science, or a similar technical field.

What the JD emphasized

  • significant experience debugging GPUs
  • Proficient in advanced tools and frameworks (e.g., CUDA, Triton, TensorRT)
  • Experience with model optimization techniques for LLMs, such as speculative decoding, continuous batching, quantization, etc.
  • performance nut

Other signals

  • optimizing ML models for performance
  • training and inference workflows
  • low-level performance profiling
  • GPU architectures
  • model optimization techniques for LLMs