Research Scientist, Infrastructure System Lab

ByteDance ByteDance · Big Tech · Seattle, WA · Infrastructure

Research Scientist focused on designing and optimizing state-of-the-art vector indexing algorithms for large-scale similarity search, filtered search, and hybrid retrieval use cases. The role involves developing new algorithms, optimizing for performance, and collaborating with engineering teams for productionization, with a strong emphasis on academic publications and staying current with AI x systems research.

What you'd actually do

  1. Research and develop new algorithms for approximate nearest neighbor (ANN) search, especially for filtered, hybrid, or disk-based scenarios.
  2. Optimize existing algorithms for scalability, low latency, memory footprint, and hybrid search support.
  3. Collaborate with engineering teams to prototype, benchmark, and productionize indexing solutions.
  4. Contribute to academic publications, open-source libraries, or internal technical documentation.
  5. Stay current with research trends in vector search, retrieval systems, retrieval-augmented generation (RAG), large language models (LLMs), and related areas.

Skills

Required

  • PhD in Computer Science, Applied Mathematics, Electrical Engineering, or a related technical field
  • Strong publication record in accredited venues (e.g., SIGMOD, VLDB, SIGIR, NeurIPS, ICML, etc.) related to vector search, indexing, IR, or ML
  • Deep understanding of ANN algorithms, quantization, graph-based indexes, and partition-based indexes
  • Strong system-level thinking: ability to profile, benchmark, and optimize performance across CPU, memory, and storage layers
  • Proficiency in C++ and/or Python, with experience in implementing and benchmarking algorithms

Nice to have

  • Experience building or contributing to vector databases or retrieval engines in production
  • Familiarity with frameworks like FAISS, ScaNN, HNSWLib, or DiskANN
  • Understanding of distributed systems and/or GPU-accelerated search
  • Experience with hybrid search (dense + sparse), multi-modal retrieval, or retrieval for LLMs

What the JD emphasized

  • Strong publication record in accredited venues
  • Deep understanding of ANN algorithms
  • Strong system-level thinking
  • vector databases
  • retrieval engines

Other signals

  • vector indexing algorithms
  • similarity search
  • filtered search
  • hybrid retrieval
  • ANN algorithms
  • retrieval-augmented generation (RAG)
  • LLMs