Software Engineer/researcher, Ai-native Database Systems

ByteDance ByteDance · Big Tech · San Jose, CA · Infrastructure

Software Engineer/Researcher to build and own AI-native database systems, acting as reasoning engines, retrieval platforms, and real-time memory for AI agents. The role involves architecting systems that integrate structured, unstructured, and vectorized data, optimizing storage for embeddings, building scalable vector search, developing AI-augmented query processors using LLMs, and collaborating on RAG infrastructure and LLM agent memory backends. Innovations in learned index structures and self-optimizing databases are also key.

What you'd actually do

  1. Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
  2. Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
  3. Build scalable and distributed vector search systems with low-latency guarantees.
  4. Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
  5. Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.

Skills

Required

  • C++
  • Rust
  • Go
  • Storage engine architecture
  • Vector retrieval systems
  • similarity search
  • ANN indexing
  • AI infra or model-serving infrastructure
  • embeddings
  • RAG
  • LLMs
  • Semantic search
  • agent systems
  • AI-native memory frameworks

Nice to have

  • Faiss
  • Milvus
  • DuckDB
  • ClickHouse
  • TiKV
  • RocksDB
  • GCP
  • AWS
  • Azure
  • RAG
  • memory-augmented models
  • self-tuning database components

What the JD emphasized

  • AI-native database systems
  • large models
  • AI workloads
  • AI agents
  • embedding ingestion
  • multimodal retrieval
  • real-time AI interaction
  • vector search systems
  • low-latency guarantees
  • AI-augmented query processors
  • large language models (LLMs)
  • semantic parsing
  • intent understanding
  • retrieval-augmented generation (RAG) infrastructure
  • LLM agent memory backends
  • learned index structures
  • self-optimizing databases
  • AI-integrated transaction systems
  • core database systems
  • large-scale distributed infrastructure
  • machine learning systems
  • Storage engine architecture
  • Vector retrieval systems
  • similarity search
  • ANN indexing
  • AI infra or model-serving infrastructure
  • embeddings
  • RAG
  • LLMs
  • Semantic search
  • agent systems
  • AI-native memory frameworks

Other signals

  • AI-native database systems
  • reasoning engines
  • retrieval platforms
  • memory for AI agents
  • AI workloads
  • intelligent systems