Senior Software Engineer / Researcher, Ai-native Database Systems

ByteDance ByteDance · Big Tech · Seattle, WA · Infrastructure

This role focuses on building next-generation AI-native database systems that act as reasoning engines, retrieval platforms, and memory for AI agents. The engineer/researcher will architect and implement systems integrating various data types, optimize storage for embeddings, build vector search, develop AI-augmented query processors, and contribute to RAG infrastructure and LLM agent memory backends. The role also involves driving innovation in learned index structures and AI-integrated transaction systems, with opportunities for publication.

What you'd actually do

  1. Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
  2. Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
  3. Build scalable and distributed vector search systems with low-latency guarantees.
  4. Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
  5. Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.

Skills

Required

  • core database systems
  • large-scale distributed infrastructure
  • machine learning systems
  • C++
  • Rust
  • Go
  • Storage engine architecture
  • Vector retrieval systems
  • similarity search
  • ANN indexing
  • AI infra
  • model-serving infrastructure
  • embeddings
  • RAG
  • LLMs
  • Semantic search
  • agent systems
  • AI-native memory frameworks

Nice to have

  • Faiss
  • Milvus
  • DuckDB
  • ClickHouse
  • TiKV
  • RocksDB
  • GCP
  • AWS
  • Azure
  • memory-augmented models
  • self-tuning database components

What the JD emphasized

  • AI-native database systems
  • reasoning engines
  • retrieval platforms
  • memory for AI agents
  • LLM agent memory backends
  • RAG infrastructure
  • vector search systems
  • embedding ingestion
  • multimodal retrieval
  • real-time AI interaction
  • AI-augmented query processors
  • semantic parsing
  • intent understanding
  • cost estimation
  • learned index structures
  • self-optimizing databases
  • AI-integrated transaction systems

Other signals

  • AI-native database systems
  • reasoning engines
  • retrieval platforms
  • memory for AI agents
  • LLM agent memory backends
  • RAG infrastructure