Senior Software Engineer / Researcher, Ai-native Database Systems

ByteDance ByteDance · Big Tech · San Jose, CA · Infrastructure

The role focuses on building and architecting AI-native database systems that integrate various data types, optimize for embedding ingestion and multimodal retrieval, and serve as reasoning engines and memory backends for AI agents. It involves developing scalable vector search systems, AI-augmented query processors, and RAG infrastructure, with a strong emphasis on systems design and implementation in C++/Rust/Go.

What you'd actually do

  1. Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
  2. Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
  3. Build scalable and distributed vector search systems with low-latency guarantees.
  4. Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
  5. Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.

Skills

Required

  • core database systems
  • large-scale distributed infrastructure
  • machine learning systems
  • C++
  • Rust
  • Go
  • Storage engine architecture
  • Vector retrieval systems
  • similarity search
  • ANN indexing
  • AI infra
  • model-serving infrastructure
  • embeddings
  • RAG
  • LLMs
  • Semantic search
  • agent systems
  • AI-native memory frameworks

Nice to have

  • Faiss
  • Milvus
  • DuckDB
  • ClickHouse
  • TiKV
  • RocksDB
  • GCP
  • AWS
  • Azure
  • RAG
  • memory-augmented models
  • self-tuning database components

What the JD emphasized

  • AI-native database systems
  • large models
  • AI workloads
  • AI agents
  • vector search systems
  • RAG infrastructure
  • LLM agent memory backends

Other signals

  • AI-native database systems
  • reasoning engines
  • retrieval platforms
  • memory for AI agents
  • embedding ingestion
  • multimodal retrieval
  • vector search systems
  • AI-augmented query processors
  • RAG infrastructure
  • LLM agent memory backends
  • learned index structures
  • self-optimizing databases
  • AI-integrated transaction systems