Together AI

Data AI · Open-source model infra

HQ: San Francisco, US
Website: together.ai

Currently tracking 20 active AI roles, with 14 new openings in the last 4 weeks. Primary focus: Serve · Engineering. Salary range $160k–$300k (avg $226k).

Hiring

20 / 20

Momentum (4w)

·0 0%

14 opens last 4w · 14 prior 4w

Salary range · avg $226k

$160k–$300k

USD · disclosed roles only

Tracked since

Jan '24

last role yesterday

Hiring velocityscroll left for older weeks

2 new roles

Jan 15

1 new role

Jun 3

2 new roles

Jan 13

1 new role

Feb 24

1 new role

Mar 24

1 new role

Apr 28

1 new role

May 12

3 new roles

Jun 2

1 new role

2 new roles

Aug 18

1 new role

Oct 27

1 new role

Nov 3

1 new role

Jan 5

3 new roles

1 new role

Feb 16

3 new roles

1 new role

Mar 2

7 new roles

3 new roles

7 new roles

Apr 6

1 new role

4 new roles

2 new roles

May 4

Jobs (16)

20 AI · 53 total active

Title	Stage	Function	Location	First seen	AI score
Research Engineer, Core ML Research Engineer role focused on improving inference efficiency and unifying it with RL/post-training systems for production-grade AI APIs. The role involves end-to-end ownership of critical systems, translating frontier ideas into robust infrastructure, and shipping measurable improvements in latency, throughput, cost, and model quality at scale.	ServePost-train	Research	San Francisco, CA	Feb 18	10
Forward Deployed Engineer (Inference & Post-Training) Forward Deployed Engineer focused on optimizing inference engines and fine-tuning pipelines for production AI teams, acting as a technical partner to strategic customers. Responsibilities include inference engine optimization, performance tuning, post-training/fine-tuning (LoRA, SFT, DPO, RLHF, GRPO), customer alignment, onboarding, and providing product feedback.	ServePost-train	Engineering	San Francisco, CA	6d ago	9
Senior Machine Learning Engineer, Voice AI Senior ML Engineer focused on optimizing the model serving layer for voice AI workloads, including speech-to-text and text-to-speech models. The role involves hands-on work with inference engines, GPU optimization, batching strategies, and ensuring new model architectures can be productionized efficiently. The goal is to achieve best-in-class latency and reliability for real-time voice applications.	Serve	Engineering	San Francisco, CA	6w ago	9
Research Engineer, Frontier Speculative Decoding Research Engineer focused on translating internal model training research into production-ready deployments by fine-tuning general-purpose models into specialized tools. This involves designing novel speculative algorithms, data curation, hyperparameter tuning, and checkpoint evaluation, with a focus on accuracy-efficiency tradeoffs for generative AI models.	Post-trainServe	Research	San Francisco, CA	Nov '25	9
Systems Research Engineer, GPU Programming This role focuses on optimizing and developing GPU-accelerated kernels and algorithms for ML/AI applications, requiring expertise in GPU programming (CUDA, Triton) and performance profiling. The engineer will collaborate with modeling, hardware, and software teams to enhance AI system efficiency and co-design GPU architectures.	Serve	Engineering	San Francisco, CA	Jan '24	9
AI Researcher, Core ML (Turbo) AI Researcher focused on the intersection of efficient inference algorithms, architectures, engines, and post-training/RL systems for production-scale API services. The role involves advancing inference efficiency, unifying inference with RL/post-training, and owning critical systems.	ServePost-train	Engineering	San Francisco, CA	Jan '24	9
Forward Deployed Engineer (GPU Clusters) The Forward Deployed Engineer (FDE) will be a technical partner to customers building large-scale AI models, focusing on GPU cluster infrastructure, networking, storage, and orchestration to ensure stability, optimize performance, and facilitate platform adoption. This role involves hardening clusters, tuning orchestration layers (Kubernetes/SLURM), debugging low-level bottlenecks, building reference designs, and leading benchmarking exercises.	Serve	Engineering	San Francisco, CA	2w ago	8
Engineering Manager, Model Serving Engineering Manager for Together AI's Model Serving platform, focusing on delivering world-class inference and fine-tuning in public APIs and customer deployments. Responsibilities include owning SLAs, improving testing/deployment/monitoring, building self-serve tooling, defining configuration best practices for inference engines, leading incident response, and mentoring team members. Requires 5+ years operating production ML inference or training systems at scale and 2+ years in senior IC or tech lead roles, with deep expertise in Kubernetes, multi-cluster orchestration, and ML serving frameworks.	ServePost-train	Engineering	San Francisco, CA	Mar 5	8
Machine Learning Engineer Machine Learning Engineer at Together AI focused on developing and scaling production systems for LLM inference and fine-tuning APIs. Requires strong experience in high-performance, distributed systems and the LLM inference ecosystem.	ServePost-train	Engineering	San Francisco, CA	Jan '25	8
Machine Learning Engineer - Inference Machine Learning Engineer focused on optimizing and enhancing the performance of AI inference systems, working with state-of-the-art large language models to ensure efficient and effective operation at scale. Responsibilities include designing and building production systems, optimizing runtime inference services, and creating supporting tools and documentation.	Serve	Engineering	San Francisco, CA	Jun '24	8
Senior Platform Engineer, Voice AI Senior Platform Engineer for Together AI's Voice AI platform, focusing on the API and infrastructure layer for real-time speech-to-text and text-to-speech models. The role involves building WebSocket and HTTP APIs, designing autoscaling for latency-sensitive streaming, and ensuring platform reliability for production voice agents.	Serve	Engineering	San Francisco, CA	6w ago	7
Senior Backend Engineer, Inference Platform Senior Backend Engineer focused on building and optimizing the inference platform for advanced generative AI models, including LLMs and multimodal models, at scale. The role involves optimizing latency, throughput, and resource allocation across tens of thousands of GPUs, collaborating with researchers to productionize frontier models, and contributing to open-source inference projects.	Serve	Engineering	San Francisco, CA	Aug '25	7
Machine Learning, Platform Engineer Machine Learning Platform Engineer at Together AI, focusing on building a container platform, optimizing autoscaling, minimizing cold starts, and improving end-to-end model performance for custom models and dedicated inference. The role involves optimizing inference across the stack, including CUDA kernels, PyTorch, inference engines, and container orchestration.	Serve	Engineering	San Francisco, CA	Aug '25	7
AI Infrastructure Engineer AI Infrastructure Engineer responsible for keeping user-facing services and production systems running smoothly, applying engineering principles and automation to operating environments. Focuses on systems, availability, reliability, and scalability, with interests in algorithms and distributed systems. Builds and runs infrastructure using Ansible, Terraform, and Kubernetes, and designs monitoring systems.	Serve	Engineering	San Francisco, CA	Jun '25	7
Senior Software Engineer - Together Cloud Infrastructure Senior Software Engineer focused on building and operating a high-performance, global AI cloud infrastructure platform. This includes designing and maintaining backend services for hardware management, IaaS software layer for GPU data centers, high-performance object storage for pretraining datasets, and advanced observability stacks for distributed pretraining. The role also involves architecture and research for decentralized AI workloads and contributing to the open-source platform.	ServeData	Engineering	San Francisco, CA	Jun '25	7
Solutions Architect Solutions Architect at Together AI to work with customers and prospects to create business value through Generative AI applications. This role involves acting as a technical advisor, running demonstrations and POCs, collaborating with sales, building relationships with customer leadership, delivering feedback to product/engineering/research, and building educational content. Requires 5+ years in a customer-facing technical role with 2+ years in pre-sales, strong technical background in AI/ML/GPU, understanding of LLM training/fine-tuning/inference, Python/JavaScript proficiency, and familiarity with infrastructure services.	Serve	Engineering	San Francisco, CA	Jan '25	7

Title

Stage

Function

Location

First seen

AI score

Research Engineer, Core ML

Research Engineer role focused on improving inference efficiency and unifying it with RL/post-training systems for production-grade AI APIs. The role involves end-to-end ownership of critical systems, translating frontier ideas into robust infrastructure, and shipping measurable improvements in latency, throughput, cost, and model quality at scale.

ServePost-train

Research

San Francisco, CA

Feb 18

Forward Deployed Engineer (Inference & Post-Training)

Forward Deployed Engineer focused on optimizing inference engines and fine-tuning pipelines for production AI teams, acting as a technical partner to strategic customers. Responsibilities include inference engine optimization, performance tuning, post-training/fine-tuning (LoRA, SFT, DPO, RLHF, GRPO), customer alignment, onboarding, and providing product feedback.

ServePost-train

Engineering

San Francisco, CA

6d ago

Senior Machine Learning Engineer, Voice AI

Senior ML Engineer focused on optimizing the model serving layer for voice AI workloads, including speech-to-text and text-to-speech models. The role involves hands-on work with inference engines, GPU optimization, batching strategies, and ensuring new model architectures can be productionized efficiently. The goal is to achieve best-in-class latency and reliability for real-time voice applications.

Serve

Engineering

San Francisco, CA

6w ago

Research Engineer, Frontier Speculative Decoding

Research Engineer focused on translating internal model training research into production-ready deployments by fine-tuning general-purpose models into specialized tools. This involves designing novel speculative algorithms, data curation, hyperparameter tuning, and checkpoint evaluation, with a focus on accuracy-efficiency tradeoffs for generative AI models.

Post-trainServe

Research

San Francisco, CA

Nov '25

Systems Research Engineer, GPU Programming

This role focuses on optimizing and developing GPU-accelerated kernels and algorithms for ML/AI applications, requiring expertise in GPU programming (CUDA, Triton) and performance profiling. The engineer will collaborate with modeling, hardware, and software teams to enhance AI system efficiency and co-design GPU architectures.

Serve

Engineering

San Francisco, CA

Jan '24

AI Researcher, Core ML (Turbo)

AI Researcher focused on the intersection of efficient inference algorithms, architectures, engines, and post-training/RL systems for production-scale API services. The role involves advancing inference efficiency, unifying inference with RL/post-training, and owning critical systems.

ServePost-train

Engineering

San Francisco, CA

Jan '24

Forward Deployed Engineer (GPU Clusters)

The Forward Deployed Engineer (FDE) will be a technical partner to customers building large-scale AI models, focusing on GPU cluster infrastructure, networking, storage, and orchestration to ensure stability, optimize performance, and facilitate platform adoption. This role involves hardening clusters, tuning orchestration layers (Kubernetes/SLURM), debugging low-level bottlenecks, building reference designs, and leading benchmarking exercises.

Serve

Engineering

San Francisco, CA

2w ago

Engineering Manager, Model Serving

Engineering Manager for Together AI's Model Serving platform, focusing on delivering world-class inference and fine-tuning in public APIs and customer deployments. Responsibilities include owning SLAs, improving testing/deployment/monitoring, building self-serve tooling, defining configuration best practices for inference engines, leading incident response, and mentoring team members. Requires 5+ years operating production ML inference or training systems at scale and 2+ years in senior IC or tech lead roles, with deep expertise in Kubernetes, multi-cluster orchestration, and ML serving frameworks.

ServePost-train

Engineering

San Francisco, CA

Mar 5

Machine Learning Engineer

Machine Learning Engineer at Together AI focused on developing and scaling production systems for LLM inference and fine-tuning APIs. Requires strong experience in high-performance, distributed systems and the LLM inference ecosystem.

ServePost-train

Engineering

San Francisco, CA

Jan '25

Machine Learning Engineer - Inference

Machine Learning Engineer focused on optimizing and enhancing the performance of AI inference systems, working with state-of-the-art large language models to ensure efficient and effective operation at scale. Responsibilities include designing and building production systems, optimizing runtime inference services, and creating supporting tools and documentation.

Serve

Engineering

San Francisco, CA

Jun '24

Senior Platform Engineer, Voice AI

Senior Platform Engineer for Together AI's Voice AI platform, focusing on the API and infrastructure layer for real-time speech-to-text and text-to-speech models. The role involves building WebSocket and HTTP APIs, designing autoscaling for latency-sensitive streaming, and ensuring platform reliability for production voice agents.

Serve

Engineering

San Francisco, CA

6w ago

Senior Backend Engineer, Inference Platform

Senior Backend Engineer focused on building and optimizing the inference platform for advanced generative AI models, including LLMs and multimodal models, at scale. The role involves optimizing latency, throughput, and resource allocation across tens of thousands of GPUs, collaborating with researchers to productionize frontier models, and contributing to open-source inference projects.

Serve

Engineering

San Francisco, CA

Aug '25

Machine Learning, Platform Engineer

Machine Learning Platform Engineer at Together AI, focusing on building a container platform, optimizing autoscaling, minimizing cold starts, and improving end-to-end model performance for custom models and dedicated inference. The role involves optimizing inference across the stack, including CUDA kernels, PyTorch, inference engines, and container orchestration.

Serve

Engineering

San Francisco, CA

Aug '25

AI Infrastructure Engineer

AI Infrastructure Engineer responsible for keeping user-facing services and production systems running smoothly, applying engineering principles and automation to operating environments. Focuses on systems, availability, reliability, and scalability, with interests in algorithms and distributed systems. Builds and runs infrastructure using Ansible, Terraform, and Kubernetes, and designs monitoring systems.

Serve

Engineering

San Francisco, CA

Jun '25

Senior Software Engineer - Together Cloud Infrastructure

Senior Software Engineer focused on building and operating a high-performance, global AI cloud infrastructure platform. This includes designing and maintaining backend services for hardware management, IaaS software layer for GPU data centers, high-performance object storage for pretraining datasets, and advanced observability stacks for distributed pretraining. The role also involves architecture and research for decentralized AI workloads and contributing to the open-source platform.

ServeData

Engineering

San Francisco, CA

Jun '25

Solutions Architect

Solutions Architect at Together AI to work with customers and prospects to create business value through Generative AI applications. This role involves acting as a technical advisor, running demonstrations and POCs, collaborating with sales, building relationships with customer leadership, delivering feedback to product/engineering/research, and building educational content. Requires 5+ years in a customer-facing technical role with 2+ years in pre-sales, strong technical background in AI/ML/GPU, understanding of LLM training/fine-tuning/inference, Python/JavaScript proficiency, and familiarity with infrastructure services.

Serve

Engineering

San Francisco, CA

Jan '25