Forward Deployed Engineer (inference & Post-training)

Together AI Together AI · Data AI · San Francisco, CA · Customer Success

Forward Deployed Engineer focused on optimizing inference engines and fine-tuning pipelines for production AI teams, acting as a technical partner to strategic customers. Responsibilities include inference engine optimization, performance tuning, post-training/fine-tuning (LoRA, SFT, DPO, RLHF, GRPO), customer alignment, onboarding, and providing product feedback.

What you'd actually do

  1. Inference Engine Optimization: Select, configure, and optimize inference engine based on hardware, model architecture, and workload profile
  2. Configuration & Performance Tuning: Develop configuration updates to win critical POCs, benchmarks, and optimize customer deployments; tune KV cache, apply speculative decoding, determine optimal tensor parallelism, and determine quantization strategy to hit throughput and latency targets.
  3. Post-Training & Fine-Tuning: Drive hands-on RL training runs and optimize system design; guide customers through LoRA, SFT, DPO, RLHF, and GRPO pipelines from experimentation through production.
  4. Strategic Customer Alignment: Act as the primary technical point of contact for aligned strategic accounts — monitoring and optimizing endpoint configurations, helping customers get the most out of the platform, and collaborating to ensure we hit critical milestones.
  5. Opinionated Onboarding: Establish direct alignment with strategic customers at onboarding; ensure the right inference and post-training configurations are in place from day one to improve time-to-value.

Skills

Required

  • Python
  • vLLM
  • TensorRT-LLM
  • SGLang
  • KV cache tuning
  • speculative decoding
  • tensor parallelism
  • pipeline parallelism
  • quantization techniques
  • LoRA
  • SFT
  • DPO
  • RLHF
  • GRPO

Nice to have

  • open-source LLM deployment
  • model selection
  • production environments

What the JD emphasized

  • inference systems
  • open-source LLM deployment
  • post-training workflows
  • inference engine
  • performance issues
  • fine-tuning
  • post-training pipelines
  • Python skills

Other signals

  • inference optimization
  • fine-tuning pipelines
  • production deployment
  • customer success
  • platform adoption