Software Engineer, Machine Learning Infrastructure

Whatnot · Consumer · San Francisco, CA · Engineering

Software Engineer, Machine Learning Infrastructure at Whatnot, focusing on building and scaling the core infrastructure for AI and ML models, including low-latency large model serving and distributed training/inference pipelines.

What you'd actually do

  1. Own the infrastructure powering AI and ML models across critical business surfaces–supporting growth, recommendations, trust and safety, fraud, seller tooling, and more.
  2. Prototype, deploy, and productionalize novel ML architectures that directly shape user experience and marketplace dynamics.
  3. Design and scale inference infrastructure capable of serving large models with low latency and high throughput.
  4. Build distributed training and inference pipelines leveraging GPUs and both model and data parallelism.
  5. Stretch beyond your comfort zone to take on new technical challenges as we scale AI across Whatnot’s ecosystem.

Skills

Required

  • 4+ years of professional experience developing machine learning systems and algorithms
  • Bachelor’s degree in Computer Science, Statistics, Applied Mathematics or a related technical field, or equivalent work experience
  • 3+ years of software engineering experience building and maintaining production systems for consumer-scale loads
  • 1+ years of professional experience developing software in Python
  • Ability to work autonomously and drive initiatives across multiple product areas and communicate findings with leadership and product teams
  • Experience with operational, search, and key-value databases such as PostgreSQL, DynamoDB, Elasticsearch, Redis
  • Firm grasp of visualization tools for monitoring and logging e.g. DataDog, Grafana
  • Familiarity with cloud computing platforms and managed services such as AWS Sagemaker, Lambda, Kinesis, S3, EC2, EKS/ECS, Apache Kafka, Flink
  • Professionalism around collaborating in a remote working environment and well tested, reproducible work
  • Exceptional documentation and communication skills

What the JD emphasized

  • productionalize novel ML architectures
  • scale inference infrastructure
  • distributed training and inference pipelines

Other signals

  • building systems that make advanced ML dependable and fast at scale
  • low-latency, large model serving
  • distributed training & high-throughput GPU inference