Together AI
ScalingAI Frontier · Open-source model infra
- HQ
- San Francisco, US
- Website
- together.ai
Currently tracking 20 active AI roles, with 14 new openings in the last 4 weeks. Primary focus: Serve · Engineering. Salary range $160k–$300k (avg $226k).
Hiring
20 / 20
Momentum (4w)
·0 0%
14 opens last 4w · 14 prior 4w
Salary range · avg $226k
$160k–$300k
USD · disclosed roles only
Tracked since
Jan '24
last role today
Hiring velocityscroll left for older weeks
Jobs (20)
| Title | Stage | AI score |
|---|---|---|
| Research Engineer, Core ML Research Engineer role focused on improving inference efficiency and unifying it with RL/post-training systems for production-grade AI APIs. The role involves end-to-end ownership of critical systems, translating frontier ideas into robust infrastructure, and shipping measurable improvements in latency, throughput, cost, and model quality at scale. | ServePost-train | 10 |
| Forward Deployed Engineer (Inference & Post-Training) Forward Deployed Engineer focused on optimizing inference engines and fine-tuning pipelines for production AI teams, acting as a technical partner to strategic customers. Responsibilities include inference engine optimization, performance tuning, post-training/fine-tuning (LoRA, SFT, DPO, RLHF, GRPO), customer alignment, onboarding, and providing product feedback. | ServePost-train | 9 |
| Senior Machine Learning Engineer, Voice AI Senior ML Engineer focused on optimizing the model serving layer for voice AI workloads, including speech-to-text and text-to-speech models. The role involves hands-on work with inference engines, GPU optimization, batching strategies, and ensuring new model architectures can be productionized efficiently. The goal is to achieve best-in-class latency and reliability for real-time voice applications. | Serve | 9 |
| Research Engineer, Frontier Speculative Decoding Research Engineer focused on translating internal model training research into production-ready deployments by fine-tuning general-purpose models into specialized tools. This involves designing novel speculative algorithms, data curation, hyperparameter tuning, and checkpoint evaluation, with a focus on accuracy-efficiency tradeoffs for generative AI models. | Post-trainServe | 9 |
| Systems Research Engineer, GPU Programming This role focuses on optimizing and developing GPU-accelerated kernels and algorithms for ML/AI applications, requiring expertise in GPU programming (CUDA, Triton) and performance profiling. The engineer will collaborate with modeling, hardware, and software teams to enhance AI system efficiency and co-design GPU architectures. | Serve | 9 |
| AI Researcher, Core ML (Turbo) AI Researcher focused on the intersection of efficient inference algorithms, architectures, engines, and post-training/RL systems for production-scale API services. The role involves advancing inference efficiency, unifying inference with RL/post-training, and owning critical systems. | ServePost-train | 9 |
| Forward Deployed Engineer (GPU Clusters) The Forward Deployed Engineer (FDE) will be a technical partner to customers building large-scale AI models, focusing on GPU cluster infrastructure, networking, storage, and orchestration to ensure stability, optimize performance, and facilitate platform adoption. This role involves hardening clusters, tuning orchestration layers (Kubernetes/SLURM), debugging low-level bottlenecks, building reference designs, and leading benchmarking exercises. | Serve | 8 |
| Engineering Manager, Model Serving Engineering Manager for Together AI's Model Serving platform, focusing on delivering world-class inference and fine-tuning in public APIs and customer deployments. Responsibilities include owning SLAs, improving testing/deployment/monitoring, building self-serve tooling, defining configuration best practices for inference engines, leading incident response, and mentoring team members. Requires 5+ years operating production ML inference or training systems at scale and 2+ years in senior IC or tech lead roles, with deep expertise in Kubernetes, multi-cluster orchestration, and ML serving frameworks. | ServePost-train | 8 |
| LLM Inference Frameworks and Optimization Engineer Seeking an Inference Frameworks and Optimization Engineer to design, develop, and optimize distributed inference engines for multimodal and language models. Focus on low-latency, high-throughput inference, GPU/accelerator optimizations, and software-hardware co-design for efficient large-scale AI deployment. | Serve | 8 |
| Machine Learning Engineer Machine Learning Engineer at Together AI focused on developing and scaling production systems for LLM inference and fine-tuning APIs. Requires strong experience in high-performance, distributed systems and the LLM inference ecosystem. | ServePost-train | 8 |
| Machine Learning Engineer - Inference Machine Learning Engineer focused on optimizing and enhancing the performance of AI inference systems, working with state-of-the-art large language models to ensure efficient and effective operation at scale. Responsibilities include designing and building production systems, optimizing runtime inference services, and creating supporting tools and documentation. | Serve | 8 |
| Senior Platform Engineer, Voice AI Senior Platform Engineer for Together AI's Voice AI platform, focusing on the API and infrastructure layer for real-time speech-to-text and text-to-speech models. The role involves building WebSocket and HTTP APIs, designing autoscaling for latency-sensitive streaming, and ensuring platform reliability for production voice agents. | Serve | 7 |
| Backend Engineer Senior Backend/Distributed Systems Engineer to build and maintain the Together AI Sandbox service, focusing on API platform performance, reliability, and scalability. Responsibilities include designing core backend components, performing research for AI workloads, and ensuring code quality through design and code reviews. | Serve | 7 |
| Together Cloud Infrastructure Engineer This role focuses on building and maintaining the AI cloud infrastructure, including services for hardware management, IaaS software layer for GPU data centers, high-performance object storage for pretraining, and advanced observability stacks. The engineer will work on the core Together AI platform, create services and tools, and develop testing frameworks for robustness and fault-tolerance. | ServeData | 7 |
| Staff Engineer, Distributed Storage,HPC & AI Infrastructure Staff Engineer focused on designing and delivering multi-petabyte distributed storage systems optimized for AI training and inference workloads. Responsibilities include architecting high-performance parallel filesystems and object stores, integrating cutting-edge technologies, driving cost optimization, and building Kubernetes-native storage operators and self-service platforms. The role requires deep expertise in distributed storage, Kubernetes, and performance optimization for GPU/HPC clusters, with strong coding skills in Go and Python. | Serve | 7 |
| Senior Backend Engineer, Inference Platform Senior Backend Engineer focused on building and optimizing the inference platform for advanced generative AI models, including LLMs and multimodal models, at scale. The role involves optimizing latency, throughput, and resource allocation across tens of thousands of GPUs, collaborating with researchers to productionize frontier models, and contributing to open-source inference projects. | Serve | 7 |
| Machine Learning, Platform Engineer Machine Learning Platform Engineer at Together AI, focusing on building a container platform, optimizing autoscaling, minimizing cold starts, and improving end-to-end model performance for custom models and dedicated inference. The role involves optimizing inference across the stack, including CUDA kernels, PyTorch, inference engines, and container orchestration. | Serve | 7 |
| AI Infrastructure Engineer AI Infrastructure Engineer responsible for keeping user-facing services and production systems running smoothly, applying engineering principles and automation to operating environments. Focuses on systems, availability, reliability, and scalability, with interests in algorithms and distributed systems. Builds and runs infrastructure using Ansible, Terraform, and Kubernetes, and designs monitoring systems. | Serve | 7 |
| Senior Software Engineer - Together Cloud Infrastructure Senior Software Engineer focused on building and operating a high-performance, global AI cloud infrastructure platform. This includes designing and maintaining backend services for hardware management, IaaS software layer for GPU data centers, high-performance object storage for pretraining datasets, and advanced observability stacks for distributed pretraining. The role also involves architecture and research for decentralized AI workloads and contributing to the open-source platform. | ServeData | 7 |
| Solutions Architect Solutions Architect at Together AI to work with customers and prospects to create business value through Generative AI applications. This role involves acting as a technical advisor, running demonstrations and POCs, collaborating with sales, building relationships with customer leadership, delivering feedback to product/engineering/research, and building educational content. Requires 5+ years in a customer-facing technical role with 2+ years in pre-sales, strong technical background in AI/ML/GPU, understanding of LLM training/fine-tuning/inference, Python/JavaScript proficiency, and familiarity with infrastructure services. | Serve | 7 |