Tech Lead, Software Engineer - AI Agent Memory Infrastructure

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

This role focuses on building and scaling the core memory infrastructure for AI agents, enabling personalized and context-aware AI experiences. It involves designing and operating large-scale, low-latency systems for memory storage, retrieval, and optimization, working at the intersection of LLMs, data systems, and context engineering, with a focus on multimodal data fusion.

What you'd actually do

  1. Design, build, and evolve the next-generation memory infrastructure for AI agents, developing a unified platform that supports long-term memory, conversational memory, and task-oriented memory.
  2. Architect and optimize memory system pipelines for large-scale, low-latency, and high-availability environments, including data ingestion, storage, indexing, retrieval, updating, compression, and forgetting mechanisms to support real-time inference and personalized interactions.
  3. Explore key challenges at the intersection of large language models, context engineering, and data management, including memory representation, retrieval and ranking, conflict resolution, summarization and fusion, and memory lifecycle management.
  4. Design unified memory models and processing workflows for multimodal data (text, image, audio, behavioral signals), enhancing agents’ long-term consistency, personalization, and task completion in complex scenarios.
  5. Collaborate closely with model, application, and platform teams to productionize memory capabilities, and continuously optimize system performance across quality, latency, cost, reliability, and safety.

Skills

Required

  • distributed systems
  • databases
  • information retrieval systems
  • AI infrastructure
  • system design
  • production engineering
  • Go
  • Python
  • C++
  • LLM applications
  • embeddings
  • retrieval-augmented generation (RAG)
  • context engineering
  • retrieval systems
  • long-term state management
  • memory extraction and representation
  • vector/graph indexing
  • retrieval and ranking
  • memory updating
  • compression and forgetting
  • multimodal memory fusion

Nice to have

  • agent memory systems
  • user profiling
  • recommendation/search feature platforms
  • knowledge base systems
  • mem0
  • memOS
  • memU
  • multimodal data processing
  • online inference systems
  • personalized agents
  • long-term user state modeling
  • system performance optimization
  • latency optimization
  • cost optimization
  • scalability optimization

What the JD emphasized

  • large-scale, low-latency, and highly reliable memory infrastructure
  • large-scale, low-latency, and high-availability environments
  • real-time inference
  • multimodal data

Other signals

  • AI Agent Memory Infrastructure
  • unified platform for long-term, conversational, and task-oriented memory
  • large-scale, low-latency, and highly reliable memory infrastructure
  • intersection of LLMs, data systems, and context engineering
  • memory representation, retrieval, and multimodal fusion