Machine Learning Engineer Iii, Search Relevance

Box Box · Enterprise · Redwood City, CA · Core Platform

Machine Learning Engineer III on the Search Relevance team at Box, focusing on improving search quality end-to-end (signals, ranking, retrieval, evaluation) by building scalable, low-latency services. The role involves implementing production features using embeddings, semantic/hybrid search, and LLM-enabled retrieval, contributing to offline/online evaluation and A/B tests, and developing reliable microservices and near real-time indexing pipelines.

What you'd actually do

  1. Design, build, and iterate on components for ranking, retrieval, and recommendations that improve measurable relevance and latency.
  2. Implement production features leveraging embeddings, semantic/hybrid search, and LLM-enabled retrieval under mentorship and design guidance.
  3. Contribute to offline/online evaluation, A/B tests, and relevance tuning using metrics such as NDCG, MRR, and precision@k.
  4. Develop reliable, observable microservices and near real-time indexing pipelines across distributed systems.
  5. Own well-scoped projects from design to rollout, writing clear design docs, tests, and operational runbooks.

Skills

Required

  • 3+ years of industry experience building backend or distributed systems
  • production ownership of services or data pipelines
  • Proficient in at least one of: Java, Scala, C++, or Python
  • Experience with data pipelines, message queues, or streaming systems (e.g., Kafka, Pub/Sub) and near real-time processing.
  • Familiarity with cloud-native microservices, CI/CD, observability, and performance tuning.
  • BS in Computer Science or related field, or equivalent practical experience.

Nice to have

  • comfortable writing production-grade Python
  • Exposure to search, ranking, recommendations, or applied ML in production
  • understand the basics of training-to-serving workflows
  • Experience with Elasticsearch, Solr, Lucene, or custom search systems; understanding of inverted indexes and scoring functions.
  • Knowledge of relevance tuning, learning-to-rank concepts, and offline/online experimentation practices.
  • Exposure to vector search, dense/sparse embeddings, and hybrid retrieval architectures.
  • Familiarity with IR fundamentals (BM25, TF-IDF, multi-stage retrieval) and query understanding.
  • Experience with Kubernetes/Terraform and a major cloud (GCP/AWS/Azure).
  • Practical exposure to PyTorch or TensorFlow; LLM familiarity helpful but not required.

What the JD emphasized

  • production ownership of services or data pipelines
  • production-grade Python
  • applied ML in production
  • near real-time processing
  • performance tuning
  • custom search systems
  • relevance tuning
  • learning-to-rank concepts
  • offline/online experimentation practices
  • vector search
  • dense/sparse embeddings
  • hybrid retrieval architectures
  • IR fundamentals
  • query understanding

Other signals

  • improving search quality end-to-end
  • building scalable, low-latency services
  • productionize modern retrieval techniques
  • experimentation frameworks