Research Engineer, Core ML

Together AI Together AI · Data AI · San Francisco, CA · Research

Research Engineer role focused on improving inference efficiency and unifying it with RL/post-training systems for production-grade AI APIs. The role involves end-to-end ownership of critical systems, translating frontier ideas into robust infrastructure, and shipping measurable improvements in latency, throughput, cost, and model quality at scale.

What you'd actually do

  1. Advance inference efficiency end‑to‑end
  2. Unify inference with RL / post‑training
  3. Own critical systems at production scale
  4. Provide technical leadership (Staff level)

Skills

Required

  • 3+ years of experience working on ML systems, large‑scale model training, inference, or adjacent areas
  • Advanced degree in Computer Science, EE, or a related field, or equivalent practical experience
  • Demonstrated experience owning complex technical projects end‑to‑end

Nice to have

  • deep expertise in RL algorithms, scheduling methods, and inference optimizations
  • experience with SGLang, vLLM, or similar serving stacks
  • experience with speculative decoding systems like ATLAS
  • strong understanding of post-training and inference theory
  • depth in RL-first or systems-first areas with appetite to collaborate across

What the JD emphasized

  • shipping measurable improvements in latency, throughput, cost, and model quality at scale
  • translate new RL algorithms, scheduling methods, and inference optimizations into production-grade systems
  • modifying production inference systems
  • building and improving frontier models via RL pipelines
  • bias toward implementation and shipping
  • Demonstrated experience owning complex technical projects end‑to‑end

Other signals

  • shipping measurable improvements in latency, throughput, cost, and model quality at scale
  • turning frontier ideas into robust infrastructure
  • modifying production inference systems
  • building and improving frontier models via RL pipelines