Software Engineer / Researcher, Ai-native Database Systems

ByteDance ByteDance · Big Tech · Seattle, WA · Infrastructure

The role focuses on building AI-native database systems that act as reasoning engines, retrieval platforms, and memory for AI agents. Responsibilities include architecting and implementing databases for structured, unstructured, and vectorized data, optimizing storage for embeddings and multimodal retrieval, building scalable vector search systems, developing AI-augmented query processors using LLMs, and collaborating on RAG infrastructure and agent memory backends. The role also involves driving innovation in learned index structures and self-optimizing databases, with an emphasis on systems for AI workloads.

What you'd actually do

  1. Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
  2. Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
  3. Build scalable and distributed vector search systems with low-latency guarantees.
  4. Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
  5. Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.

Skills

Required

  • C++
  • Rust
  • Go
  • Storage engine architecture
  • LSM-trees
  • column stores
  • HTAP systems
  • Vector retrieval systems
  • similarity search
  • ANN indexing
  • AI infra
  • model-serving infrastructure
  • embeddings
  • RAG
  • LLMs
  • Semantic search
  • agent systems
  • AI-native memory frameworks
  • systems design
  • large-scale distributed infrastructure
  • machine learning systems

Nice to have

  • Faiss
  • Milvus
  • DuckDB
  • ClickHouse
  • TiKV
  • RocksDB
  • GCP
  • AWS
  • Azure
  • RAG
  • memory-augmented models
  • self-tuning database components

What the JD emphasized

  • AI-native database systems
  • large models
  • AI workloads
  • embedding ingestion
  • multimodal retrieval
  • vector search systems
  • large language models (LLMs)
  • retrieval-augmented generation (RAG)
  • LLM agent memory backends
  • learned index structures
  • self-optimizing databases
  • AI-integrated transaction systems
  • core database systems
  • large-scale distributed infrastructure
  • machine learning systems
  • Storage engine architecture
  • Vector retrieval systems
  • similarity search
  • ANN indexing
  • AI infra or model-serving infrastructure
  • embeddings
  • RAG
  • LLMs
  • Semantic search
  • agent systems
  • AI-native memory frameworks
  • RAG
  • memory-augmented models
  • self-tuning database components

Other signals

  • AI-native database systems
  • reasoning engines
  • retrieval platforms
  • memory for AI agents
  • embedding ingestion
  • multimodal retrieval
  • vector search systems
  • AI-augmented query processors
  • RAG infrastructure
  • LLM agent memory backends