Machine Learning, Platform Engineer

Together AI Together AI · Data AI · San Francisco, CA · Engineering

Machine Learning Platform Engineer at Together AI, focusing on building a container platform, optimizing autoscaling, minimizing cold starts, and improving end-to-end model performance for custom models and dedicated inference. The role involves optimizing inference across the stack, including CUDA kernels, PyTorch, inference engines, and container orchestration.

What you'd actually do

  1. New hires may work on multi-cluster orchestration, portfolio optimization, predictive autoscaling, control panes, model bring-up, model optimization, APIs for managing deployments, inference worker SDKs, and CLI tools.
  2. Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
  3. Partner with product teams to understand functional requirements and deliver solutions that meet business needs
  4. Write clear, well-tested, and maintainable software and IaC for both new and existing systems
  5. Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance

Skills

Required

  • Python
  • Golang
  • Rust
  • C++
  • Haskell
  • Terraform
  • Kubernetes

Nice to have

  • serverless inference platforms
  • model bring-up
  • on call experience
  • cloud provider experience
  • video generation
  • audio generation
  • CUDA kernels
  • pytorch optimization
  • inference engines
  • container orchestration
  • queueing theory
  • ML bottlenecks

What the JD emphasized

  • 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems.
  • Experience running serverless inference platforms, doing model bring-up on short notice, being on call, or running a cloud provider is a very big plus
  • Good taste and ability to thoughtfully discuss how what you’ve built has failed over time
  • Excellent understanding of low level operating systems concepts including concurrency, networking and storage, performance and scale
  • Experience with Kubernetes internals or other container orchestration systems

Other signals

  • optimizing autoscaling
  • minimizing cold starts
  • model performance
  • developer experience
  • CUDA kernels
  • pytorch optimization
  • inference engines
  • container orchestration
  • queueing theory