Senior Staff Software Engineer, Indexing & Retrieval Platform

Reddit Reddit · Consumer · United States · Remote · Machine Learning

Senior Staff Software Engineer for Reddit's ML Indexing & Retrieval Platform team, focusing on building and scaling infrastructure for ML-driven recommendations, data ingestion, low-latency retrieval, and GenAI applications. The role involves leading technical strategy, architecture, and implementation of next-generation systems, partnering with product engineers, defining best practices for observability and reliability, and mentoring other engineers.

What you'd actually do

  1. Lead the technical strategy, architecture, and implementation of Reddit’s next-generation ML Indexing & Retrieval engine, integrating capabilities across lexical and vector indexing, low-latency retrieval, and emerging GenAI applications.
  2. Partner closely with product engineers across Content Understanding, Search, Feeds, Ads, Growth, and Safety to deliver high-quality experiences.
  3. Define best practices for observability, reliability, and operational excellence in large-scale distributed systems.
  4. Mentor and guide engineers in designing scalable infrastructure and adopting robust DevOps and SRE principles.
  5. Collaborate with infrastructure, and ML teams to ensure the platform evolves to meet the needs of Reddit’s growing user base and diverse content ecosystem.

Skills

Required

  • Go, Java, Python, or any object oriented programming language
  • Flink, Airflow, Spark for large scale batch & stream processing
  • Vector, Lexical & Key-Value Databases
  • Kubernetes, Docker, AWS, GCP
  • Indexing and Retrieval systems
  • technical leadership
  • architecting and scaling distributed systems
  • large-scale data platforms
  • batch indexing and stream processing
  • large-scale, low-latency retrieval services
  • lexical and vector search retrieval technologies
  • cloud-native architectures
  • containerized workloads
  • communicator and mentor

Nice to have

  • Milvus, Vespa, or Elasticsearch

What the JD emphasized

  • next-generation ML Indexing & Retrieval engine
  • emerging GenAI applications
  • large-scale distributed systems
  • large-scale, low-latency retrieval services

Other signals

  • ML data ingestion
  • low-latency retrieval services
  • end-to-end lifecycle management of data
  • GenAI applications
  • Content Understanding
  • Semantic, Lexical retrieval