Senior Software Engineer - AI Agent Memory Infrastructure

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

Senior Software Engineer to build and evolve next-generation memory infrastructure for AI agents, focusing on a unified platform for long-term, conversational, and task-oriented memory. This role involves architecting and optimizing large-scale, low-latency pipelines for data ingestion, storage, indexing, retrieval, and updating, working at the intersection of LLMs, context engineering, and data management. Responsibilities include designing unified memory models for multimodal data and collaborating with teams to productionize these capabilities.

What you'd actually do

  1. Design, build, and evolve the next-generation memory infrastructure for AI agents, developing a unified platform that supports long-term memory, conversational memory, and task-oriented memory.
  2. Architect and optimize memory system pipelines for large-scale, low-latency, and high-availability environments, including data ingestion, storage, indexing, retrieval, updating, compression, and forgetting mechanisms to support real-time inference and personalized interactions.
  3. Explore key challenges at the intersection of large language models, context engineering, and data management, including memory representation, retrieval and ranking, conflict resolution, summarization and fusion, and memory lifecycle management.
  4. Design unified memory models and processing workflows for multimodal data (text, image, audio, behavioral signals), enhancing agents’ long-term consistency, personalization, and task completion in complex scenarios.
  5. Collaborate closely with model, application, and platform teams to productionize memory capabilities, and continuously optimize system performance across quality, latency, cost, reliability, and safety.

Skills

Required

  • distributed systems
  • databases
  • information retrieval systems
  • AI infrastructure
  • system design
  • production engineering
  • Go
  • Python
  • C++
  • embeddings
  • retrieval-augmented generation (RAG)
  • context engineering
  • retrieval systems
  • long-term state management
  • memory extraction and representation
  • vector/graph indexing
  • retrieval and ranking
  • memory updating
  • compression and forgetting
  • multimodal memory fusion

Nice to have

  • agent memory systems
  • user profiling
  • recommendation/search feature platforms
  • knowledge base systems
  • mem0
  • memOS
  • memU
  • multimodal data processing
  • online inference systems
  • personalized agents
  • long-term user state modeling
  • system performance, latency, cost, and scalability trade-offs

What the JD emphasized

  • large-scale, low-latency, and highly reliable memory infrastructure
  • large-scale, low-latency, and high-availability environments
  • real-time inference
  • productionize memory capabilities
  • complex production systems is highly preferred

Other signals

  • building core memory systems for AI agents
  • large-scale, low-latency, highly reliable memory infrastructure
  • intersection of LLMs, data systems, and context engineering
  • productionize memory capabilities