Lead Machine Learning Engineer

Capital One Capital One · Banking · Cambridge, MA +2

Lead Machine Learning Engineer at Capital One, focused on building and deploying AI-powered risk management solutions. The role involves designing, developing, testing, and deploying AI software components, including LLM inference, similarity search, guardrails, governance, observability, and agentic AI. Responsibilities include fine-tuning, developing, and evaluating ML and foundation models, contributing to technical vision, and leveraging a broad stack of AI technologies. The role also requires retraining, maintaining, and monitoring production models, constructing optimized data pipelines, and ensuring responsible and explainable AI practices.

What you'd actually do

  1. Design, develop, test, deploy, and support AI software components utilizing machine learning models, including model evaluation and experimentation, large language model inference, similarity search, guardrails, governance, observability and agentic AI.
  2. Fine-tune, develop and evaluate machine learning and foundation models,
  3. Retrain, maintain, and monitor models in production.
  4. Construct optimized data pipelines to feed ML models.
  5. Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI.

Skills

Required

  • Python, Scala, or Java
  • designing and building data-intensive solutions using distributed computing
  • machine learning models
  • model evaluation and experimentation
  • large language model inference
  • similarity search
  • guardrails
  • governance
  • observability
  • agentic AI
  • fine-tuning
  • developing and evaluating machine learning and foundation models
  • Open Source and SaaS AI technologies
  • ML modeling techniques
  • ML infrastructure decisions
  • Retrain, maintain, and monitor models in production
  • optimized data pipelines
  • Responsible and Explainable AI

Nice to have

  • collaboration with cross-functional teams
  • communication of complex technical concepts
  • staying abreast of the latest research
  • applying novel techniques in production
  • problem-solving
  • engineering and mathematics foundation
  • hardware and software expertise
  • strategic thinking
  • business needs understanding
  • Agile methodologies

What the JD emphasized

  • state-of-the-art AI technology
  • agentic AI
  • Responsible and Explainable AI

Other signals

  • building and deploying proprietary solutions for Risk management that are powered by state-of-the-art AI technology
  • design, develop, test, deploy, and support AI software components utilizing machine learning models
  • large language model inference, similarity search, guardrails, governance, observability and agentic AI
  • fine-tune, develop and evaluate machine learning and foundation models
  • contribute thought leadership and technical vision to the long term roadmap of pioneering AI systems