Sr. Lead AI Engineer (inference Optimization, Fm Hosting, AI Platform)

Capital One Capital One · Banking · San Jose, CA +4

This role focuses on optimizing the performance, scalability, cost, and latency of large-scale production AI systems, specifically for foundation model training and large language model inference. It involves designing, developing, and deploying AI software components, including inference services, and contributing to the AI platform. The role also touches upon aspects of foundation model training and agentic systems (via guardrails, similarity search).

What you'd actually do

  1. Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc.
  2. Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems.
  3. Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One.
  4. Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more.

Skills

Required

  • Python
  • Go
  • Scala
  • Java
  • Computer Science
  • AI
  • Electrical Engineering
  • Computer Engineering

Nice to have

  • AWS
  • Google Cloud
  • Azure
  • Huggingface
  • VectorDBs
  • Nemo Guardrails
  • PyTorch
  • C++
  • C#
  • Golang
  • LLM Inference
  • Similarity Search
  • VectorDBs
  • Guardrails
  • Memory
  • optimizing training and inference software
  • hardware utilization
  • latency
  • throughput
  • cost
  • AI research
  • AI systems
  • communication
  • presentation

What the JD emphasized

  • optimize the performance — scalability, cost, latency, throughput — of large scale production AI systems
  • foundation model training
  • large language model inference
  • AI platform

Other signals

  • inference optimization
  • foundation model training
  • large language model inference
  • AI platform