Re / Rs - Foundations, Search

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

Research role focused on embedding retrieval and agentic search, developing foundational technology for future frontier models. Involves designing new embedding training objectives, scalable vector store architectures, and dynamic indexing methods, with potential for publication and integration into OpenAI products.

What you'd actually do

  1. Tackle embedding models and retrieval systems optimized for grounding, relevance, and adaptive reasoning.
  2. Collaborate with a team of researchers and engineers building end-to-end infrastructure for training, evaluating, and integrating embeddings into frontier models.
  3. Drive innovation in dense, sparse, and hybrid representation techniques, metric learning, and learning-to-retrieve systems.
  4. Collaborate closely with Pretraining, Inference, and other Research teams to integrate retrieval throughout the model lifecycle
  5. Contribute to OpenAI’s long-term vision of AI systems with memory and knowledge access capabilities rooted in learned representations.

Skills

Required

  • Deep technical expertise in representation learning, embedding models, or vector retrieval systems.
  • Familiarity with transformer-based LLMs and how embedding spaces can interact with language model objectives.
  • Research experience in areas such as contrastive learning, supervised or unsupervised embedding learning, or metric learning.
  • A track record of building or scaling large machine learning systems, particularly embedding pipelines in production or research contexts.
  • A first-principles mindset for challenging assumptions about how retrieval and memory should work for large models.

Nice to have

  • Proven experience leading high-performance teams of researchers or engineers in ML infrastructure or foundational research.

What the JD emphasized

  • foundational research
  • embedding retrieval
  • agentic search
  • frontier models
  • representation learning
  • embedding models
  • vector retrieval systems
  • transformer-based LLMs
  • embedding spaces
  • contrastive learning
  • supervised or unsupervised embedding learning
  • metric learning
  • embedding pipelines
  • retrieval and memory

Other signals

  • foundational technology
  • embedding retrieval
  • frontier models
  • agentic search
  • large-scale systems