What you'd actually do

Build, evaluate, and evolve foundational agents such as DeepSearch, DeepResearch, Extract, and Compose.

Develop techniques for intent detection, query understanding, ranking, and RAG to improve accuracy and relevance.

Define metrics, evaluation pipelines, and benchmarks for agent quality, including precision/recall, factual grounding, and latency trade-offs.

Research and implement best practices in retrieval, orchestration, and evaluation of multi-agent workflows.

Collaborate with platform engineers to design core components that enable secure, reliable, and scalable deployment of agents.

Skills

Required

3+ years of industry experience building or evaluating ML-powered systems
MS or PhD degree in Machine Learning, Computer Science, or a related field
Strong background in machine learning, information retrieval, or natural language processing
Proficiency with at least one programming language such as Python, Java, or Scala
Experience designing, training, and evaluating ML models in production
Familiarity with retrieval systems, ranking models, RAG pipelines, or intent classification

Nice to have

Advanced degree in computer science, machine learning, or related field
Hands-on experience with LangChain, LangGraph, or other agent frameworks
Familiarity with LLMs, embeddings, semantic search, indexing, and relevance optimization
Experience with cloud-based ML platforms such as Vertex AI, AWS Bedrock, or SageMaker
Experience with Kubernetes-based systems for deploying and scaling ML workloads
Research or applied experience in evaluation of generative AI systems (factuality, safety, grounding)

WHAT IS BOX?

Box (NYSE:BOX) is the leader in Intelligent Content Management. Our platform enables organizations to fuel collaboration, manage the entire content lifecycle, secure critical content, and transform business workflows with enterprise AI. We help companies thrive in the new AI-first era of business. Founded in 2005, Box simplifies work for leading global organizations, including JLL, Morgan Stanley, and Nationwide. Box is headquartered in Redwood City, CA, with offices across the United States, Europe, and Asia.

By joining Box, you will have the unique opportunity to continue driving our platform forward. Content powers how we work. It’s the billions of files and information flowing across teams, departments, and key business processes every single day: contracts, invoices, employee records, financials, product specs, marketing assets, and more. Our mission is to bring intelligence to the world of content management and empower our customers to completely transform workflows across their organizations. With the combination of AI and enterprise content, the opportunity has never been greater to transform how the world works together and at Box you will be on the front lines of this massive shift.

**WHY BOX NEEDS YOU **

AI is transforming how enterprises work, and Box is building an enterprise-grade Agents Platform at the core of the Box Content Cloud. Our platform, built on LangGraph, enables teams across Box and our customers to design, deploy, and operate AI agents that handle real-world enterprise workflows—from content understanding and generation to intelligent metadata, automation, and complex, multi-step orchestrations.

As a founding ML Engineer on the Core Agents team, you will build and evaluate the foundational agents that power the Box AI ecosystem, including DeepSearch, DeepResearch, Extract, and Compose. You’ll design techniques for intent detection, ranking, evaluation, retrieval-augmented generation (RAG), and multi-agent orchestration, while also establishing metrics and evaluation frameworks to measure agent quality.

Your work will shape how agents retrieve, reason, and act on enterprise content with high accuracy and trustworthiness. You’ll collaborate closely with platform engineers to build the core components of the Agents Platform that enable these agents to run at scale, while also empowering other Box teams and customers to configure and customize agents for their workflows.

**WHAT YOU'LL DO **

Build, evaluate, and evolve foundational agents such as DeepSearch, DeepResearch, Extract, and Compose.
Develop techniques for intent detection, query understanding, ranking, and RAG to improve accuracy and relevance.
Define metrics, evaluation pipelines, and benchmarks for agent quality, including precision/recall, factual grounding, and latency trade-offs.
Research and implement best practices in retrieval, orchestration, and evaluation of multi-agent workflows.
Collaborate with platform engineers to design core components that enable secure, reliable, and scalable deployment of agents.
Partner with product teams to translate enterprise use cases into agentic solutions, ensuring measurable improvements in user experience.
Contribute to technical discussions, share research insights, and help define the roadmap for Box’s agent ecosystem.

**WHO YOU ARE **

You are passionate about building and evaluating AI agents that solve enterprise problems.
You enjoy working at the intersection of machine learning and distributed systems, bridging research with production.
You’ve designed or evaluated ML systems for search, ranking, RAG, or conversational AI.
You like to be an owner and strive to do work you’re proud of—both technically and in your team interactions.
You are collaborative, curious, and comfortable mentoring or learning from other engineers and ML practitioners.

Must Have Experience

3+ years of industry experience building or evaluating ML-powered systems.
MS or PhD degree in Machine Learning, Computer Science, or a related field.
Strong background in machine learning, information retrieval, or natural language processing.
Proficiency with at least one programming language such as Python, Java, or Scala.
Experience designing, training, and evaluating ML models in production.
Familiarity with retrieval systems, ranking models, RAG pipelines, or intent classification.

Nice To Have Experience

Advanced degree in computer science, machine learning, or related field.
Hands-on experience with LangChain, LangGraph, or other agent frameworks.
Familiarity with LLMs, embeddings, semantic search, indexing, and relevance optimization.
Experience with cloud-based ML platforms such as Vertex AI, AWS Bedrock, or SageMaker.
Experience with Kubernetes-based systems for deploying and scaling ML workloads.
Research or applied experience in evaluation of generative AI systems (factuality, safety, grounding).

Box lives its values, with community and in-person collaboration being a core part of our culture. Boxers are expected to work from their assigned office a minimum of 3 days per week.Your Recruiter will share more about how we work and company culture during the hiring process.

At Box, we believe unique and diverse experiences benefit our culture, our products, our customers, our company, and our world. We aim to recruit a passionate, high-performing workforce that reflects the world we live in.** **If you are head-over-heels about this role but unsure if you meet all the requirements, we encourage you to apply!

EQUAL OPPORTUNITY

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability, and any other protected ground of discrimination under applicable human rights legislation. Box strives to respect the dignity and ‎‎independence of people with disabilities and is committed to giving them the same ‎‎opportunity to succeed as all other employees. Inclusiveness is core to our culture at Box, and we strive to ensure you get the most from your interview experience.

Box makes reasonable accommodations for applicants with disabilities. If a reasonable accommodation is needed to participate in the job application or interview process, please complete this form. Reasonable accommodations may include scheduling adjustments, document dictation and beyond.

Notice to applicants in Los Angeles: Box, Inc and its related branches will consider for employment, qualified applicants with criminal histories in a manner consistent with the Los Angeles Fair Chair Ordinance. The Fair Chance Ordinance is provided here.

Notice to applicants in San Francisco: Box, Inc and its related branches will consider for employment, qualified applicants with criminal histories in a manner consistent with the San Francisco Fair Chair Ordinance. The Fair Chance Ordinance is provided here.

For details on how we protect your information when you apply, please see our Personnel Privacy Notice. If you are a California-resident, please read our California Applicant & Candidate Privacy Notice here.

Box is committed to fair and equitable compensation practices. Actual base salary (or OTE if commissionable role) is dependent upon factors such as: knowledge, skill level, experience, and work location. This role is also eligible for equity and benefits. For more information, check out our benefits and perks. __

__In accordance with OFCCP compliance, here is the Pay Transparency Provision. __

United States Pay Range

$175,500—$219,500 USD