What you'd actually do

Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.

Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.

Build scalable and distributed vector search systems with low-latency guarantees.

Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.

Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.

Skills

Required

C++
Rust
Go
Storage engine architecture
LSM-trees
column stores
HTAP systems
Vector retrieval systems
similarity search
ANN indexing
AI infra
model-serving infrastructure
embeddings
RAG
LLMs
Semantic search
agent systems
AI-native memory frameworks
systems design
large-scale distributed infrastructure
machine learning systems

Nice to have

Faiss
Milvus
DuckDB
ClickHouse
TiKV
RocksDB
GCP
AWS
Azure
RAG
memory-augmented models
self-tuning database components

What the JD emphasized

AI-native database systems

large models

AI workloads

embedding ingestion

multimodal retrieval

vector search systems

large language models (LLMs)

retrieval-augmented generation (RAG)

LLM agent memory backends

learned index structures

self-optimizing databases

AI-integrated transaction systems

core database systems

large-scale distributed infrastructure

machine learning systems

Storage engine architecture

Vector retrieval systems

similarity search

ANN indexing

AI infra or model-serving infrastructure

embeddings

RAG

LLMs

Semantic search

agent systems

AI-native memory frameworks

RAG

memory-augmented models

self-tuning database components

About the Team Join ByteDance’s database R&D team, where you’ll build and own cutting-edge database products supporting Bytedance’s global infrastructure. Our diverse portfolio includes relational databases, distributed caches, key-value stores, document databases, graph databases, wide-column stores, search engines, and multi-model databases. In this role, you’ll have the opportunity to enhance these services in a cloud-native environment, embracing a culture of intellectual curiosity, self-direction, and problem-solving.

About the Role We are building the next-generation AI-native database systems—intelligent, multimodal, and designed for the era of large models. Our systems are not just data stores; they’re reasoning engines, retrieval platforms, and real-time memory for AI agents. As a Senior Software Engineer or Researcher, you will be at the forefront of rethinking how databases work when built from the ground up for AI workloads. You’ll help create infrastructure that powers intelligent systems across TikTok, CapCut, and future applications that haven’t been imagined yet.

Responsibilities

Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
Build scalable and distributed vector search systems with low-latency guarantees.
Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.
Drive innovations in learned index structures, self-optimizing databases, and AI-integrated transaction systems.
Publish and contribute to broader research and open-source communities.

Requirements

Minimum Qualification

Bachelor’s, Master’s, or Ph.D. in Computer Science or related fields with strong systems or AI research experience.
2+ years in core database systems, large-scale distributed infrastructure, or machine learning systems.
Strong coding and system-level design skills in C++ / Rust / Go.
Deep expertise in one or more of the following areas: Storage engine architecture (LSM-trees, column stores, HTAP systems) / Vector retrieval systems, similarity search, and ANN indexing / AI infra or model-serving infrastructure (especially for embeddings / RAG / LLMs) / Semantic search, agent systems, or AI-native memory frameworks
Ability to collaborate across research, engineering, and product teams to translate ideas into production systems.

Preferred Qualifications:

Experience with open-source systems such as Faiss, Milvus, DuckDB, ClickHouse, TiKV, RocksDB.
Publications at accredited conferences (e.g., SIGMOD, VLDB, NeurIPS, MLSys, ICDE).
Familiarity with GCP, AWS, or Azure’s database and AI integration strategies.
Prior contributions to RAG, memory-augmented models, or self-tuning database components.

Responsibilities

Architect and implement AI-native databases that seamlessly integrate structured, unstructured, and vectorized data.
Design storage engines optimized for embedding ingestion, multimodal retrieval, and real-time AI interaction.
Build scalable and distributed vector search systems with low-latency guarantees.
Develop AI-augmented query processors that leverage large language models (LLMs) for semantic parsing, intent understanding, and cost estimation.
Collaborate on developing retrieval-augmented generation (RAG) infrastructure and LLM agent memory backends.
Drive innovations in learned index structures, self-optimizing databases, and AI-integrated transaction systems.
Publish and contribute to broader research and open-source communities.

Requirements

Minimum Qualification

Bachelor’s, Master’s, or Ph.D. in Computer Science or related fields with strong systems or AI research experience.
2+ years in core database systems, large-scale distributed infrastructure, or machine learning systems.
Strong coding and system-level design skills in C++ / Rust / Go.
Deep expertise in one or more of the following areas: Storage engine architecture (LSM-trees, column stores, HTAP systems) / Vector retrieval systems, similarity search, and ANN indexing / AI infra or model-serving infrastructure (especially for embeddings / RAG / LLMs) / Semantic search, agent systems, or AI-native memory frameworks
Ability to collaborate across research, engineering, and product teams to translate ideas into production systems.

Preferred Qualifications:

Experience with open-source systems such as Faiss, Milvus, DuckDB, ClickHouse, TiKV, RocksDB.
Publications at accredited conferences (e.g., SIGMOD, VLDB, NeurIPS, MLSys, ICDE).
Familiarity with GCP, AWS, or Azure’s database and AI integration strategies.
Prior contributions to RAG, memory-augmented models, or self-tuning database components.

Software Engineer / Researcher, Ai-native Database Systems

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Requirements

Requirements