Lead AI Engineer (fm Hosting, LLM Inference)

Capital One Capital One · Banking · New York, NY +3

Lead AI Engineer focused on optimizing LLM inference for scalability, cost, and latency within an enterprise AI setting. The role involves designing, developing, and deploying AI software components, including foundation model training, inference services, similarity search, guardrails, and model evaluation, leveraging cloud platforms and various AI technologies.

What you'd actually do

  1. Design, develop, test, deploy, and support AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability, etc.
  2. Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems.
  3. Contribute to the technical vision and the long term roadmap of foundational AI systems at Capital One.
  4. Partner with a cross-functional team of engineers, research scientists, technical program managers, and product managers to deliver AI-powered products that change how our associates work and how our customers interact with Capital One.

Skills

Required

  • Python
  • Go
  • Scala
  • Java
  • AI and ML algorithms
  • developing AI and ML algorithms or technologies
  • programming with Python, Go, Scala, or Java

Nice to have

  • Experience deploying scalable and responsible AI solutions on cloud platforms (e.g. AWS, Google Cloud, Azure, or equivalent private cloud)
  • Experience designing, developing, delivering, and supporting AI services
  • Experience developing AI and ML algorithms or technologies (e.g. LLM Inference, Similarity Search and VectorDBs, Guardrails, Memory) using Python, C++, C#, Java, or Golang
  • Experience developing and applying state-of-the-art techniques for optimizing training and inference software to improve hardware utilization, latency, throughput, and cost
  • Passion for staying abreast of the latest AI research and AI systems, and judiciously apply novel techniques in production

What the JD emphasized

  • AI software components including foundation model training, large language model inference, similarity search, guardrails, model evaluation, experimentation, governance, and observability
  • state-of-the-art LLM optimization techniques
  • scalability, cost, latency, throughput
  • foundational AI systems
  • deploying scalable and responsible AI solutions on cloud platforms
  • LLM Inference, Similarity Search and VectorDBs, Guardrails, Memory
  • optimizing training and inference software

Other signals

  • LLM Inference
  • AI Infrastructure
  • Optimization
  • Scalability
  • Cost reduction