Staff Machine Learning Engineer

Cresta Cresta · Vertical AI · United States · Remote · Engineering

Staff Machine Learning Engineer at Cresta, focusing on building and scaling agentic AI systems, evaluating LLM agent performance, and architecting RAG pipelines for enterprise data. The role involves leading technical vision, designing multi-agent orchestration, and ensuring robustness, reliability, and cost-efficiency of LLM systems.

What you'd actually do

  1. Define and lead the technical vision for Cresta’s next-generation Agentic AI systems, including Agentic Assist and enterprise AI Agents.
  2. Architect scalable, production-grade LLM systems that integrate reasoning, retrieval, planning, tool use, and real-time decision-making into cohesive, intelligent workflows.
  3. Design and evolve multi-agent orchestration frameworks that combine RAG, structured knowledge, domain-adapted models, and automated actions.
  4. Establish best practices for building robust, reliable, and cost-efficient LLM-powered systems in high-scale production environments.
  5. Own evaluation strategy for complex, non-deterministic AI systems, including offline benchmarking, online experimentation, LLM-as-a-judge methodologies, and systematic failure analysis.

Skills

Required

  • Bachelor’s degree in Computer Science, Mathematics, or a related field
  • 7+ years of experience building and deploying machine learning systems in production
  • deep hands-on experience with LLMs at scale
  • Demonstrated leadership in architecting complex AI systems, particularly agentic or multi-step LLM workflows
  • Deep expertise in transformer-based models, embeddings, retrieval systems, and Retrieval-Augmented Generation (RAG) pipelines
  • Experience designing evaluation frameworks for LLM systems beyond single-turn prompts, including robustness testing and production monitoring
  • Strong systems thinking: ability to design for scalability, latency constraints, cost efficiency, security, and long-term maintainability
  • Extensive experience with modern ML frameworks (e.g., PyTorch, TensorFlow, Hugging Face) and distributed/cloud-based infrastructure
  • Proven ability to influence technical direction across teams as a senior individual contributor
  • A strong bias toward action — able to prototype rapidly while maintaining production rigor

Nice to have

  • Master’s or Ph.D. strongly preferred

What the JD emphasized

  • deep expertise in LLMs and modern prompting techniques
  • proven ability to translate cutting-edge research into scalable, production-grade systems
  • diagnosing and mitigating failure modes such as hallucinations, retrieval errors, tool misuse, context drift, prompt brittleness, and multi-step reasoning breakdowns
  • defining measurable quality metrics (e.g., accuracy, faithfulness, task completion, latency, and cost) for complex, non-deterministic systems
  • architect scalable, production-grade LLM systems
  • design and evolve multi-agent orchestration frameworks
  • establish best practices for building robust, reliable, and cost-efficient LLM-powered systems
  • own evaluation strategy for complex, non-deterministic AI systems
  • proactively identify and mitigate agent failure modes
  • define measurable quality standards
  • deep hands-on experience with LLMs at scale
  • demonstrated leadership in architecting complex AI systems, particularly agentic or multi-step LLM workflows
  • experience designing evaluation frameworks for LLM systems beyond single-turn prompts, including robustness testing and production monitoring
  • strong systems thinking: ability to design for scalability, latency constraints, cost efficiency, security, and long-term maintainability

Other signals

  • building next-generation agentic AI systems
  • design evaluation frameworks and improve the reliability, robustness, and performance of LLM-powered agents
  • architect and scale LLM and retrieval-augmented generation pipelines