Senior Software Engineer, Machine Learning Infrastructure - Generative AI

DoorDash DoorDash · Consumer · San Francisco, CA · 313 Infrastructure Engineering

Senior Software Engineer on the GenAI Platform team, responsible for building and owning the production infrastructure for Generative AI, focusing on open-weights model serving (inference and fine-tuning) and related platform components like gateways, evals, and guardrails. The role involves leading technical direction, optimizing cost/performance, and partnering with other teams to enable GenAI product development.

What you'd actually do

  1. Lead the design of infrastructure that helps DoorDash teams move GenAI ideas from prototype to production, increasing the velocity of business impact from AI across the company.
  2. Own and evolve our open-weights serving stack — real-time GPU endpoints, high-throughput batch inference, and fine-tuning (SFT/DPO/LoRA) — alongside the LLM Gateway, Agent Gateway, evals infrastructure, guardrails, and cost attribution.
  3. Architect scalable, high-performance systems for model serving, batch inference, GPU autoscaling, and fine-tuning that power real customer and internal automation use cases
  4. Push the cost and latency frontier of GPU inference — turning batch jobs that took days into hours and cutting inference cost by multiples — while giving product teams a clean choice across open-weight and closed-source models with reliability, fallback, observability, and cost controls built in.
  5. Build platforms that support rapid experimentation while meeting production standards for latency, scale, monitoring, SLOs, playbooks, and operational excellence.

Skills

Required

  • Python
  • distributed systems
  • designing and owning production services, APIs, data pipelines, or ML infrastructure at scale
  • operating systems in production, including observability, debugging, reliability, incident response, and performance/cost optimization
  • LLM inference and/or fine-tuning of open-weight models in production — serving (latency, throughput, batching, autoscaling, GPU utilization) and/or fine-tuning (SFT/DPO/LoRA)
  • technical leadership: leading design across ambiguous, fast-moving technical areas, mentoring engineers, and turning customer use cases into reusable platform capabilities

Nice to have

  • LLM inference engines and serving frameworks (e.g., vLLM, SGLang, TensorRT-LLM) in production
  • distributed/multi-node fine-tuning and training pipelines (SFT, DPO/RLHF, LoRA), including data preparation and evaluation
  • GPU performance work — multi-node/distributed inference, KV-cache/memory optimization, quantization (FP8/INT8/AWQ/GPTQ), or cold-start/throughput tuning
  • Kubernetes, cloud infrastructure (AWS/GCP), GPUs, serverless/elastic GPU platforms (e.g., Modal), or high-throughput batch systems
  • LLM gateways, model routing, vendor abstraction, or cost attribution
  • developer platforms, internal platforms, or self-serve infrastructure
  • building and deploying AI agents or MCP servers in production
  • eval systems, LLM observability, tracing, RAG, search, or vector databases

What the JD emphasized

  • open-weights model platform spanning inference and fine-tuning
  • real-time GPU serving
  • high-throughput batch inference
  • model fine-tuning
  • cost/performance frontier of GPU inference and fine-tuning
  • LLM inference and/or fine-tuning of open-weight models in production

Other signals

  • building shared infrastructure for GenAI-powered products
  • running frontier open-weight LLMs and VLMs
  • real-time GPU serving, high-throughput batch inference, and fine-tuning
  • LLM Gateway, Agent Gateway, evals infrastructure, guardrails, and cost attribution