Senior Associate, Data Scientist - US Card (applied Genai)

Capital One Capital One · Banking · McLean, VA +1

This role focuses on applying generative AI and LLMs to unstructured data (text, image) for enterprise applications in areas like customer servicing and document processing. It involves building and operationalizing ML/NLP models, assessing production architectures, defining AI observability, and evaluating AI system behavior. The role emphasizes practical application and scaling AI adoption within a regulated financial environment.

What you'd actually do

  1. Apply expertise in unstructured data (text, image) to harness the power of open source large language models (LLMs) and visual language models (VLMs)
  2. Leverage a broad stack of technologies — LangGraph, LlamaIndex, Weights and Biases Weave, Hugging Face, PyTorch, AWS, and more — to automate workflows using huge volumes of text and vision data
  3. Build machine learning and NLP models through all phases of development, from design through training, evaluation, and validation; partnering with engineering teams to operationalize them in scalable and resilient production systems that serve 80+ million customers.
  4. Assessing GenAI or LLM-Powered application architectures in production, including best practices for Generative AI development and deployments.
  5. Define requirements for AI observability, focusing on the traceability of autonomous decisions and comprehensive system audit trails.

Skills

Required

  • Python
  • Scala
  • R
  • machine learning
  • clustering
  • classification
  • sentiment analysis
  • time series
  • deep learning
  • AWS

Nice to have

  • Master’s Degree in “STEM” field
  • PhD in “STEM” field

What the JD emphasized

  • operationalize them in scalable and resilient production systems
  • serve 80+ million customers
  • Assessing GenAI or LLM-Powered application architectures in production
  • best practices for Generative AI development and deployments
  • Define requirements for AI observability
  • traceability of autonomous decisions
  • comprehensive system audit trails
  • Evaluate the dynamic behavior of AI systems
  • continuous monitoring controls and testing
  • non-deterministic outputs and autonomous actions remain within risk appetite

Other signals

  • Applied GenAI
  • open source generative AI models
  • scale the adoption of AI
  • embedding AI in varied domains
  • applying generative AI on millions of inputs
  • customer experience
  • extracting key information from unstructured documents
  • analyzing call transcripts
  • customer friction
  • best-in-class products and experiences powered by the latest emerging generative AI technologies
  • unstructured data (text, image)
  • open source large language models (LLMs)
  • visual language models (VLMs)
  • LangGraph, LlamaIndex, Weights and Biases Weave, Hugging Face, PyTorch, AWS
  • automate workflows using huge volumes of text and vision data
  • Build machine learning and NLP models through all phases of development, from design through training, evaluation, and validation
  • operationalize them in scalable and resilient production systems
  • serve 80+ million customers
  • Assessing GenAI or LLM-Powered application architectures in production
  • best practices for Generative AI development and deployments
  • Define requirements for AI observability
  • traceability of autonomous decisions
  • comprehensive system audit trails
  • Evaluate the dynamic behavior of AI systems
  • continuous monitoring controls and testing
  • non-deterministic outputs and autonomous actions remain within risk appetite
  • internal business processes and data operations
  • guiding annotators to curate high quality, consistent datasets for model training, evaluation, and ongoing AI monitoring
  • team of data scientists through all phases of project development, from design through training, evaluation, validation, implementation, and maintenance
  • Interact with a variety of internal stakeholders
  • alignment of data science solutions with business outcomes
  • Customer first
  • Innovative
  • continually research and evaluate emerging technologies
  • stay current on published state-of-the-art methods, technologies, and applications
  • seek out opportunities to apply them
  • Creative
  • thrive on bringing definition to big, undefined problems
  • asking questions and pushing hard to find answers
  • not afraid to share a new idea
  • A leader
  • challenge conventional thinking
  • work with stakeholders to identify and improve the status quo
  • passionate about talent development for your own team and beyond
  • Technical
  • comfortable with open-source languages
  • passionate about developing further
  • hands-on experience developing data science solutions using open-source tools and cloud computing platforms
  • Statistically-minded
  • built models, validated them, and backtested them
  • know how to interpret a confusion matrix or a ROC curve
  • experience with clustering, classification, sentiment analysis, time series, and deep learning
  • A data guru
  • Big data doesn’t faze you
  • skills to retrieve, combine, and analyze data from a variety of sources and structures
  • understanding the data is often the key to great data science