Senior Software Engineer, Inference Platform

MongoDB MongoDB · Enterprise · Seattle, WA · PTO Atlas Search

Senior Software Engineer to build the next-generation inference platform for embedding models used in semantic search and AI-native experiences within MongoDB Atlas. The role focuses on core systems and services for real-time, low-latency, high-scale inference, collaborating with ML researchers and engineers.

What you'd actually do

  1. Design and build components of a multi-tenant inference platform integrated directly with MongoDB Atlas, supporting semantic search and hybrid retrieval
  2. Collaborate with AI engineers and researchers to productionize inference for embedding models and rerankers — enabling both batch and real-time use cases
  3. Contribute to platform capabilities such as latency-aware routing, model versioning, health monitoring, and observability
  4. Improve performance, autoscaling, GPU utilization, and resource efficiency in a cloud-native environment
  5. Work across product, infrastructure, and ML teams to ensure the inference platform meets the scale, reliability, and latency demands of Atlas users

Skills

Required

  • 5+ years of experience building backend or infrastructure systems at scale
  • Strong software engineering skills in languages such as Go, Rust, Python, or C++, with an emphasis on performance and reliability
  • Experienced in cloud-native architectures, distributed systems, and multi-tenant service design
  • Familiar with concepts in ML model serving and inference runtimes

Nice to have

  • Knowledge of vector search systems (e.g., Faiss, HNSW, ScaNN)
  • Experience integrating infrastructure with production ML workloads
  • Understanding of hybrid retrieval, prompt-driven systems, or retrieval-augmented generation (RAG)
  • Contributions to open-source infrastructure for ML serving or search

What the JD emphasized

  • productionize inference
  • low-latency
  • high-scale inference
  • cloud-native environment
  • reliability
  • latency demands

Other signals

  • inference platform
  • embedding models
  • semantic search
  • low-latency
  • high-scale inference