LLM Platform Engineer

Whatnot · Consumer · San Francisco, CA · Engineering

Seeking an LLM Platform Engineer to design and scale the core infrastructure for large language model applications at Whatnot. This role involves building systems for retrieval, grounding LLM responses, developing evaluation frameworks, and implementing feedback pipelines to bring AI into production for various business surfaces like growth, recommendations, trust and safety, and fraud.

What you'd actually do

  1. Own the infrastructure powering LLMs across critical business surfaces– supporting growth, recommendations, trust and safety, fraud, seller tooling, and more.
  2. Create robust and scalable LLM evaluation frameworks to measure model performance, guide iteration, and prevent regression via CI/CD.
  3. Deploy RAG systems and MCP servers to more effectively ground LLM responses in Whatnot’s business context while enforcing rigorous PII controls.
  4. Design efficient human-in-the-loop feedback pipelines that can be used to inform scalable LLM evaluation
  5. Bridge the gap between research and production, helping to transform experimental ideas into scalable solutions

Skills

Required

  • Python
  • production systems
  • consumer-scale loads
  • PostgreSQL
  • DynamoDB
  • Elasticsearch
  • Redis
  • DataDog
  • Grafana
  • AWS Sagemaker
  • AWS Lambda
  • AWS Kinesis
  • AWS S3
  • AWS EC2
  • AWS EKS/ECS
  • Apache Kafka
  • Apache Flink
  • remote working environment
  • documentation
  • communication

Nice to have

  • LLM infrastructure
  • RAG systems
  • LLM evaluation frameworks
  • human-in-the-loop feedback

What the JD emphasized

  • 3+ years of software engineering experience building and maintaining production systems for consumer-scale loads
  • 1+ years of professional experience developing software in Python
  • Ability to work autonomously and drive initiatives across multiple product areas and communicate findings with leadership and product teams.

Other signals

  • LLM infrastructure
  • RAG systems
  • LLM evaluation frameworks
  • human-in-the-loop feedback