Together AI

Scaling

AI Frontier · Open-source model infra

HQ
San Francisco, US
Website
together.ai

Currently tracking 20 active AI roles, with 14 new openings in the last 4 weeks. Primary focus: Serve · Engineering. Salary range $160k–$300k (avg $226k).

Hiring
20 / 20
Momentum (4w)
·0 0%
14 opens last 4w · 14 prior 4w
Salary range · avg $226k
$160k–$300k
USD · disclosed roles only
Tracked since
Jan '24
last role today
Hiring velocityscroll left for older weeks
2 new roles
Jan 15
1 new role
Jun 3
2 new roles
Jan 13
1 new role
20
1 new role
27
1 new role
Feb 24
1 new role
Mar 24
1 new role
Apr 28
1 new role
May 12
3 new roles
Jun 2
1 new role
23
2 new roles
Aug 18
1 new role
25
1 new role
Nov 3
1 new role
17
1 new role
Jan 5
3 new roles
19
1 new role
Feb 16
3 new roles
23
1 new role
Mar 2
7 new roles
9
3 new roles
30
7 new roles
Apr 6
1 new role
13
4 new roles
27
2 new roles
May 4

Jobs (20)

20 AI · 52 total active
TitleStageFunctionLocationFirst seenAI score
Research Engineer, Core ML
Research Engineer role focused on improving inference efficiency and unifying it with RL/post-training systems for production-grade AI APIs. The role involves end-to-end ownership of critical systems, translating frontier ideas into robust infrastructure, and shipping measurable improvements in latency, throughput, cost, and model quality at scale.
ServePost-trainResearchSan Francisco, CAFeb 1810
Forward Deployed Engineer (Inference & Post-Training)
Forward Deployed Engineer focused on optimizing inference engines and fine-tuning pipelines for production AI teams, acting as a technical partner to strategic customers. Responsibilities include inference engine optimization, performance tuning, post-training/fine-tuning (LoRA, SFT, DPO, RLHF, GRPO), customer alignment, onboarding, and providing product feedback.
ServePost-trainEngineeringSan Francisco, CA3d ago9
Senior Machine Learning Engineer, Voice AI
Senior ML Engineer focused on optimizing the model serving layer for voice AI workloads, including speech-to-text and text-to-speech models. The role involves hands-on work with inference engines, GPU optimization, batching strategies, and ensuring new model architectures can be productionized efficiently. The goal is to achieve best-in-class latency and reliability for real-time voice applications.
ServeEngineeringSan Francisco, CA6w ago9
Research Engineer, Frontier Speculative Decoding
Research Engineer focused on translating internal model training research into production-ready deployments by fine-tuning general-purpose models into specialized tools. This involves designing novel speculative algorithms, data curation, hyperparameter tuning, and checkpoint evaluation, with a focus on accuracy-efficiency tradeoffs for generative AI models.
Post-trainServeResearchSan Francisco, CANov '259
Systems Research Engineer, GPU Programming
This role focuses on optimizing and developing GPU-accelerated kernels and algorithms for ML/AI applications, requiring expertise in GPU programming (CUDA, Triton) and performance profiling. The engineer will collaborate with modeling, hardware, and software teams to enhance AI system efficiency and co-design GPU architectures.
ServeEngineeringSan Francisco, CAJan '249
AI Researcher, Core ML (Turbo)
AI Researcher focused on the intersection of efficient inference algorithms, architectures, engines, and post-training/RL systems for production-scale API services. The role involves advancing inference efficiency, unifying inference with RL/post-training, and owning critical systems.
ServePost-trainEngineeringSan Francisco, CAJan '249
Forward Deployed Engineer (GPU Clusters)
The Forward Deployed Engineer (FDE) will be a technical partner to customers building large-scale AI models, focusing on GPU cluster infrastructure, networking, storage, and orchestration to ensure stability, optimize performance, and facilitate platform adoption. This role involves hardening clusters, tuning orchestration layers (Kubernetes/SLURM), debugging low-level bottlenecks, building reference designs, and leading benchmarking exercises.
ServeEngineeringSan Francisco, CA1w ago8
Engineering Manager, Model Serving
Engineering Manager for Together AI's Model Serving platform, focusing on delivering world-class inference and fine-tuning in public APIs and customer deployments. Responsibilities include owning SLAs, improving testing/deployment/monitoring, building self-serve tooling, defining configuration best practices for inference engines, leading incident response, and mentoring team members. Requires 5+ years operating production ML inference or training systems at scale and 2+ years in senior IC or tech lead roles, with deep expertise in Kubernetes, multi-cluster orchestration, and ML serving frameworks.
ServePost-trainEngineeringSan Francisco, CAMar 58
LLM Inference Frameworks and Optimization Engineer
Seeking an Inference Frameworks and Optimization Engineer to design, develop, and optimize distributed inference engines for multimodal and language models. Focus on low-latency, high-throughput inference, GPU/accelerator optimizations, and software-hardware co-design for efficient large-scale AI deployment.
ServeEngineeringRemoteMar '258
Machine Learning Engineer
Machine Learning Engineer at Together AI focused on developing and scaling production systems for LLM inference and fine-tuning APIs. Requires strong experience in high-performance, distributed systems and the LLM inference ecosystem.
ServePost-trainEngineeringSan Francisco, CAJan '258
Machine Learning Engineer - Inference
Machine Learning Engineer focused on optimizing and enhancing the performance of AI inference systems, working with state-of-the-art large language models to ensure efficient and effective operation at scale. Responsibilities include designing and building production systems, optimizing runtime inference services, and creating supporting tools and documentation.
ServeEngineeringSan Francisco, CAJun '248
Senior Platform Engineer, Voice AI
Senior Platform Engineer for Together AI's Voice AI platform, focusing on the API and infrastructure layer for real-time speech-to-text and text-to-speech models. The role involves building WebSocket and HTTP APIs, designing autoscaling for latency-sensitive streaming, and ensuring platform reliability for production voice agents.
ServeEngineeringSan Francisco, CA6w ago7
Backend Engineer
Senior Backend/Distributed Systems Engineer to build and maintain the Together AI Sandbox service, focusing on API platform performance, reliability, and scalability. Responsibilities include designing core backend components, performing research for AI workloads, and ensuring code quality through design and code reviews.
ServeEngineeringAmsterdam, NetherlandsMar 107
Together Cloud Infrastructure Engineer
This role focuses on building and maintaining the AI cloud infrastructure, including services for hardware management, IaaS software layer for GPU data centers, high-performance object storage for pretraining, and advanced observability stacks. The engineer will work on the core Together AI platform, create services and tools, and develop testing frameworks for robustness and fault-tolerance.
ServeDataEngineeringAmsterdam, NetherlandsJan 207
Staff Engineer, Distributed Storage,HPC & AI Infrastructure
Staff Engineer focused on designing and delivering multi-petabyte distributed storage systems optimized for AI training and inference workloads. Responsibilities include architecting high-performance parallel filesystems and object stores, integrating cutting-edge technologies, driving cost optimization, and building Kubernetes-native storage operators and self-service platforms. The role requires deep expertise in distributed storage, Kubernetes, and performance optimization for GPU/HPC clusters, with strong coding skills in Go and Python.
ServeEngineeringAmsterdam, NetherlandsJan 207
Senior Backend Engineer, Inference Platform
Senior Backend Engineer focused on building and optimizing the inference platform for advanced generative AI models, including LLMs and multimodal models, at scale. The role involves optimizing latency, throughput, and resource allocation across tens of thousands of GPUs, collaborating with researchers to productionize frontier models, and contributing to open-source inference projects.
ServeEngineeringSan Francisco, CAAug '257
Machine Learning, Platform Engineer
Machine Learning Platform Engineer at Together AI, focusing on building a container platform, optimizing autoscaling, minimizing cold starts, and improving end-to-end model performance for custom models and dedicated inference. The role involves optimizing inference across the stack, including CUDA kernels, PyTorch, inference engines, and container orchestration.
ServeEngineeringSan Francisco, CAAug '257
AI Infrastructure Engineer
AI Infrastructure Engineer responsible for keeping user-facing services and production systems running smoothly, applying engineering principles and automation to operating environments. Focuses on systems, availability, reliability, and scalability, with interests in algorithms and distributed systems. Builds and runs infrastructure using Ansible, Terraform, and Kubernetes, and designs monitoring systems.
ServeEngineeringSan Francisco, CAJun '257
Senior Software Engineer - Together Cloud Infrastructure
Senior Software Engineer focused on building and operating a high-performance, global AI cloud infrastructure platform. This includes designing and maintaining backend services for hardware management, IaaS software layer for GPU data centers, high-performance object storage for pretraining datasets, and advanced observability stacks for distributed pretraining. The role also involves architecture and research for decentralized AI workloads and contributing to the open-source platform.
ServeDataEngineeringSan Francisco, CAJun '257
Solutions Architect
Solutions Architect at Together AI to work with customers and prospects to create business value through Generative AI applications. This role involves acting as a technical advisor, running demonstrations and POCs, collaborating with sales, building relationships with customer leadership, delivering feedback to product/engineering/research, and building educational content. Requires 5+ years in a customer-facing technical role with 2+ years in pre-sales, strong technical background in AI/ML/GPU, understanding of LLM training/fine-tuning/inference, Python/JavaScript proficiency, and familiarity with infrastructure services.
ServeEngineeringSan Francisco, CAJan '257