Senior AI Infrastructure Engineer, Model Serving Platform

Scale AI Scale AI · Data AI · New York, NY +1 · Research

Senior AI Infrastructure Engineer focused on building and maintaining scalable, reliable, and efficient platforms for serving LLMs. The role involves backend system design, integrating models, developing monitoring solutions, and leading projects end-to-end. Requires strong programming skills and experience with LLM serving fundamentals and container orchestration.

What you'd actually do

  1. Build and maintain fault-tolerant, high-performance systems for serving LLMs workloads at scale.
  2. Build an internal platform to empower LLM capability discovery.
  3. Collaborate with researchers and engineers to integrate and optimize models for production and research use cases.
  4. Conduct architecture and design reviews to uphold best practices in system design and scalability.
  5. Develop monitoring and observability solutions to ensure system health and performance.

Skills

Required

  • Python
  • Go
  • Rust
  • C++
  • LLM serving
  • LLM routing
  • rate limiting
  • token streaming
  • load balancing
  • budgets
  • LLM capabilities
  • reasoning
  • tool calling
  • prompt templates
  • Docker
  • Kubernetes
  • AWS
  • GCP
  • Terraform

Nice to have

  • vLLM
  • SGLang
  • TensorRT-LLM
  • text-generation-inference

What the JD emphasized

  • 5+ years of experience building large-scale, high-performance backend systems
  • Experience with LLM serving and routing fundamentals
  • Experience with modern LLM serving frameworks

Other signals

  • building platforms for scalable, reliable, and efficient serving of LLMs
  • integrating and optimizing models for production and research use cases
  • LLM serving and routing fundamentals
  • modern LLM serving frameworks