Principal Associate, Data Scientist - People Strategy & Analytics

Capital One Capital One · Banking · McLean, VA

Data Scientist role focused on applying AI/ML, specifically LLMs with RAG and prompt engineering, to HR talent decisions. The role involves building NLP and ML models through all development phases, partnering with cross-functional teams, and leveraging technologies like Python, SQL, AWS, LangChain, Hugging Face, VectorDBs, and Pytorch/TensorFlow. The ideal candidate is innovative, creative, technical, and statistically-minded.

What you'd actually do

  1. Build natural language processing and machine learning models through all phases of development, from design through training, evaluation, validation, and implementation
  2. Apply expertise in using open source large language models (LLMs) through prompt engineering, retrieval-augmented generation (RAG) and evaluation metric frameworks for business specific applications
  3. Partner with a cross-functional team of data scientists, software engineers, business analysts, and product managers to deliver industry leading HR tools and AI-powered products
  4. Leverage a broad stack of technologies — Python, SQL, AWS, LangChain, Hugging Face Transformers, VectorDBs, Pytorch/TensorFlow, and more — to reveal the insights hidden within large volumes of numeric and textual data
  5. Flex your interpersonal skills to collaborate with internal stakeholders, translating complex data science work into tangible, aligned business outcomes.

Skills

Required

  • Bachelor's Degree in a quantitative field plus 5 years of experience performing data analytics
  • Master's Degree in a quantitative field plus 3 years of experience performing data analytics
  • PhD in a quantitative field
  • Python
  • SQL
  • machine learning
  • relational databases
  • AI/ML tools and ecosystems such as Hugging Face, VectorDBs or Pytorch/TensorFlow

Nice to have

  • Master’s Degree in “STEM” field plus 3 years of experience in data analytics
  • PhD in “STEM” field
  • AWS
  • Snowflake

What the JD emphasized

  • Build natural language processing and machine learning models through all phases of development
  • Apply expertise in using open source large language models (LLMs) through prompt engineering, retrieval-augmented generation (RAG) and evaluation metric frameworks
  • Leverage a broad stack of technologies — Python, SQL, AWS, LangChain, Hugging Face Transformers, VectorDBs, Pytorch/TensorFlow

Other signals

  • applying artificial intelligence, machine learning, and social science to build models
  • Build natural language processing and machine learning models through all phases of development
  • Apply expertise in using open source large language models (LLMs) through prompt engineering, retrieval-augmented generation (RAG) and evaluation metric frameworks
  • Leverage a broad stack of technologies — Python, SQL, AWS, LangChain, Hugging Face Transformers, VectorDBs, Pytorch/TensorFlow