Software Engineer/researcher, Ai-native Database Systems

ByteDance · Big Tech · San Jose, CA · Infrastructure

Software Engineer/Researcher to build and own AI-native database systems, acting as reasoning engines, retrieval platforms, and real-time memory for AI agents. The role involves architecting systems that integrate structured, unstructured, and vectorized data, optimizing storage for embeddings, building scalable vector search, developing AI-augmented query processors using LLMs, and collaborating on RAG infrastructure and LLM agent memory backends. Innovations in learned index structures and self-optimizing databases are also key.

What you'd actually do

Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
Build scalable and distributed vector search systems with low-latency guarantees.
Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.

Skills

Required

C++
Rust
Go
Storage engine architecture
Vector retrieval systems
similarity search
ANN indexing
AI infra or model-serving infrastructure
embeddings
RAG
LLMs
Semantic search
agent systems
AI-native memory frameworks

Nice to have

Faiss
Milvus
DuckDB
ClickHouse
TiKV
RocksDB
GCP
AWS
Azure
RAG
memory-augmented models
self-tuning database components

What the JD emphasized

AI-native database systems
large models
AI workloads
AI agents
embedding ingestion
multimodal retrieval
real-time AI interaction
vector search systems
low-latency guarantees
AI-augmented query processors
large language models (LLMs)
semantic parsing
intent understanding
retrieval-augmented generation (RAG) infrastructure
LLM agent memory backends
learned index structures
self-optimizing databases
AI-integrated transaction systems
core database systems
large-scale distributed infrastructure
machine learning systems
Storage engine architecture
Vector retrieval systems
similarity search
ANN indexing
AI infra or model-serving infrastructure
embeddings
RAG
LLMs
Semantic search
agent systems
AI-native memory frameworks

Other signals

AI-native database systems
reasoning engines
retrieval platforms
memory for AI agents
AI workloads
intelligent systems

Read full job description

About the Team Join ByteDance’s database R&D team, where you’ll build and own cutting-edge database products supporting ByteDance’s global infrastructure. Our diverse portfolio includes relational databases, distributed caches, key-value stores, document databases, graph databases, wide-column stores, search engines, and multi-model databases. In this role, you’ll have the opportunity to enhance these services in a cloud-native environment, embracing a culture of intellectual curiosity, self-direction, and problem-solving.

About the Role We are building the next-generation AI-native database systems—intelligent, multimodal, and designed for the era of large models. Our systems are not just data stores; they’re reasoning engines, retrieval platforms, and real-time memory for AI agents. As a Senior Software Engineer or Researcher, you will be at the forefront of rethinking how databases work when built from the ground up for AI workloads. You’ll help create infrastructure that powers intelligent systems across TikTok, CapCut, and future applications that haven’t been imagined yet.

Responsibilities

Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
Build scalable and distributed vector search systems with low-latency guarantees.
Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.
Drive innovations in learned index structures, self-optimizing databases, and AI-integrated transaction systems.
Publish and contribute to broader research and open-source communities.

Requirements

Minimum Qualification

Bachelor’s, Master’s, or Ph.D. in Computer Science or related fields with strong systems or AI research experience.
2+ years in core database systems, large-scale distributed infrastructure, or machine learning systems.
Strong coding and system-level design skills in C++ / Rust / Go.
Deep expertise in one or more of the following areas: Storage engine architecture (LSM-trees, column stores, HTAP systems) / Vector retrieval systems, similarity search, and ANN indexing / AI infra or model-serving infrastructure (especially for embeddings / RAG / LLMs) / Semantic search, agent systems, or AI-native memory frameworks
Ability to collaborate across research, engineering, and product teams to translate ideas into production systems.

Preferred Qualifications:

Experience with open-source systems such as Faiss, Milvus, DuckDB, ClickHouse, TiKV, RocksDB.
Publications at conferences (e.g., SIGMOD, VLDB, NeurIPS, MLSys, ICDE).
Familiarity with GCP, AWS, or Azure’s database and AI integration strategies.
Prior contributions to RAG, memory-augmented models, or self-tuning database components.

Responsibilities

Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
Build scalable and distributed vector search systems with low-latency guarantees.
Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.
Drive innovations in learned index structures, self-optimizing databases, and AI-integrated transaction systems.
Publish and contribute to broader research and open-source communities.

Requirements

Minimum Qualification

Bachelor’s, Master’s, or Ph.D. in Computer Science or related fields with strong systems or AI research experience.
2+ years in core database systems, large-scale distributed infrastructure, or machine learning systems.
Strong coding and system-level design skills in C++ / Rust / Go.
Deep expertise in one or more of the following areas: Storage engine architecture (LSM-trees, column stores, HTAP systems) / Vector retrieval systems, similarity search, and ANN indexing / AI infra or model-serving infrastructure (especially for embeddings / RAG / LLMs) / Semantic search, agent systems, or AI-native memory frameworks
Ability to collaborate across research, engineering, and product teams to translate ideas into production systems.

Preferred Qualifications:

Experience with open-source systems such as Faiss, Milvus, DuckDB, ClickHouse, TiKV, RocksDB.
Publications at conferences (e.g., SIGMOD, VLDB, NeurIPS, MLSys, ICDE).
Familiarity with GCP, AWS, or Azure’s database and AI integration strategies.
Prior contributions to RAG, memory-augmented models, or self-tuning database components.