Engineering Manager - Model Performance

Baseten · Data AI · San Francisco, CA · EPD

Engineering Manager for Model Performance at Baseten, a company providing inference infrastructure for AI companies. The role involves leading a team of engineers to optimize ML model inference and performance, focusing on production-level AI/ML solutions and scaling large models. Requires a strong engineering background, leadership experience, and expertise in ML performance optimization, with hands-on work in areas like TensorRT, PyTorch, and CUDA.

What you'd actually do

  1. Lead, mentor, and manage a team of engineers focused on developing and optimizing ML model inference and performance.
  2. Oversee technical strategy and architecture decisions, driving improvements across our engineering organization.
  3. Collaborate with cross-functional teams to ensure seamless integration and scalability of ML models in production environments.
  4. Dive into the codebase of frameworks like TensorRT, PyTorch, CUDA, and others to identify and solve complex performance bottlenecks.
  5. Drive the development and deployment of large-scale optimization techniques for various ML models, especially large language models (LLMs).

Skills

Required

  • Python
  • C++
  • Go
  • PyTorch
  • TensorRT
  • CUDA
  • Docker
  • Kubernetes
  • ML model performance optimization
  • production-level AI/ML solutions
  • scaling and deploying large models
  • team leadership
  • project management

Nice to have

  • LLM optimization techniques (quantization, speculative decoding, continuous batching)
  • GPU architecture and performance tuning
  • startup environment experience

What the JD emphasized

  • ML model performance optimization
  • production-level AI/ML solutions
  • scaling and deploying large models
  • large language models (LLMs)

Other signals

  • powers mission-critical inference
  • leading a team of exceptional engineers
  • ML performance and inference
  • leading and mentoring a team
  • hands-on with technology
  • driving improvements across our engineering organization
  • production-level AI/ML solutions
  • scaling and deploying large models