Machine Learning Engineer - Inference

Together AI Together AI · Data AI · San Francisco, CA · Research

Machine Learning Engineer focused on optimizing and enhancing the performance of AI inference systems, working with state-of-the-art large language models to ensure efficient and effective operation at scale. Responsibilities include designing and building production systems, optimizing runtime inference services, and creating supporting tools and documentation.

What you'd actually do

  1. Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.
  2. Develop and optimize runtime inference services for large-scale AI applications.
  3. Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
  4. Conduct design and code reviews to ensure high standards of quality.
  5. Create services, tools, and developer documentation to support the inference engine.

Skills

Required

  • Python
  • PyTorch
  • high-performance libraries
  • tooling
  • low-level operating systems concepts
  • multi-threading
  • memory management
  • networking
  • storage
  • performance
  • scale

Nice to have

  • TGI
  • vLLM
  • TensorRT-LLM
  • Optimum
  • speculative decoding
  • CUDA
  • Triton
  • Rust
  • Cython
  • compilers

What the JD emphasized

  • high-performance
  • production-quality code
  • low-level operating systems concepts
  • AI inference systems
  • AI inference techniques
  • CUDA/Triton programming

Other signals

  • production systems
  • inference engine
  • large-scale AI applications
  • high performance libraries and tooling