Software Engineer- Bis (baseten Inference Stack)

Baseten Baseten · Data AI · San Francisco, CA · EPD

Software Engineer for Baseten's Inference Stack team, focusing on building and operating the distributed runtime for large-scale LLM inference. The role involves working across the stack from developer experience to low-level infrastructure, ensuring performance, scalability, and reliability of AI model deployments.

What you'd actually do

  1. Develop infrastructure and orchestration systems for deploying and managing large-scale distributed LLM inference
  2. Work across the stack, from customer-facing features to low-level infrastructure components
  3. Build platform capabilities related to routing, autoscaling, scheduling, observability, and runtime management
  4. Improve the reliability, scalability, and usability of our inference stack
  5. Collaborate closely with Model Performance engineers to make new inference optimizations broadly available to customers and easy to configure

Skills

Required

  • distributed systems
  • backend infrastructure
  • platform engineering
  • production systems
  • developer experience
  • debugging complex systems

Nice to have

  • Kubernetes
  • Dynamo
  • vLLM
  • SGLang
  • TensorRT-LLM
  • distributed scheduling
  • autoscaling
  • service orchestration
  • GPU workloads
  • observability tooling
  • CI/CD systems
  • release automation
  • open-source infrastructure
  • ML systems

What the JD emphasized

  • reliability, latency, and scale are first-class concerns
  • debug complex production systems
  • genuine interest in inference engineering

Other signals

  • building inference stack
  • large-scale LLM inference
  • distributed systems
  • production systems