Research Scientist, Infrastructure System Lab

ByteDance ByteDance · Big Tech · San Jose, CA · Infrastructure

Research Scientist role focused on designing and optimizing state-of-the-art vector indexing algorithms for large-scale similarity search and retrieval, powering next-generation vector databases. The work involves research into ANN search, optimization for performance, and collaboration with engineering for productionization, with a strong emphasis on academic publications and staying current with AI x systems research.

What you'd actually do

  1. Research and develop new algorithms for approximate nearest neighbor (ANN) search, especially for filtered, hybrid, or disk-based scenarios.
  2. Optimize existing algorithms for scalability, low latency, memory footprint, and hybrid search support.
  3. Collaborate with engineering teams to prototype, benchmark, and productionize indexing solutions.
  4. Contribute to academic publications, open-source libraries, or internal technical documentation.
  5. Stay current with research trends in vector search, retrieval systems, retrieval-augmented generation (RAG), large language models (LLMs), and related areas.

Skills

Required

  • PhD in Computer Science, Applied Mathematics, Electrical Engineering, or a related technical field
  • Strong publication record in accredited venues (e.g., SIGMOD, VLDB, SIGIR, NeurIPS, ICML, etc.) related to vector search, indexing, IR, or ML.
  • Deep understanding of ANN algorithms, quantization, graph-based indexes, and partition-based indexes.
  • Strong system-level thinking: ability to profile, benchmark, and optimize performance across CPU, memory, and storage layers.
  • Proficiency in C++ and/or Python, with experience in implementing and benchmarking algorithms.

Nice to have

  • Experience building or contributing to vector databases or retrieval engines in production.
  • Familiarity with frameworks like FAISS, ScaNN, HNSWLib, or DiskANN.
  • Understanding of distributed systems and/or GPU-accelerated search.
  • Experience with hybrid search (dense + sparse), multi-modal retrieval, or retrieval for LLMs.
  • Passion for bridging theory and practice in production-scale systems.

What the JD emphasized

  • Strong publication record in accredited venues (e.g., SIGMOD, VLDB, SIGIR, NeurIPS, ICML, etc.) related to vector search, indexing, IR, or ML.
  • Deep understanding of ANN algorithms, quantization, graph-based indexes, and partition-based indexes.
  • Strong system-level thinking: ability to profile, benchmark, and optimize performance across CPU, memory, and storage layers.

Other signals

  • vector indexing algorithms
  • similarity search
  • retrieval systems
  • RAG
  • LLMs