Manager, Data Science - Emerging ML

Capital One Capital One · Banking · McLean, VA +2

Capital One's Emerging ML team is seeking a Manager, Data Science to conduct research and development in AI, focusing on embeddings and foundation models. The role involves building machine learning models from design through production, partnering with product and engineering teams, and analyzing large-scale customer data using tools like Spark and AWS. The ideal candidate is curious, technical, statistically-minded, and customer-oriented, with hands-on experience in data science lifecycle and open-source tools.

What you'd actually do

  1. Build machine learning models through all phases of development, from design through training, evaluation and validation, and partner with engineering teams to operationalize them in scalable and resilient production systems that serve 50+ million customers.
  2. Partner closely with a variety of business and product teams across Capital One to conduct the experiments that guide improvements to customer experiences and business outcomes in domains like marketing, servicing and fraud prevention.
  3. Write software (Python, Scala, e.g.) to collect, explore, visualize and analyze numerical and textual data (billions of customer transactions, clicks, payments, etc.) using tools like Spark and AWS.

Skills

Required

  • Bachelor's Degree in a quantitative field plus 6 years of experience performing data analytics OR Master's Degree in a quantitative field plus 4 years of experience performing data analytics OR PhD in a quantitative field plus 1 year of experience performing data analytics
  • 1 year of experience leveraging open source programming languages for large scale data analysis
  • 1 year of experience working with machine learning
  • 1 year of experience utilizing relational databases

Nice to have

  • PhD in “STEM” field
  • Experience working with AWS
  • 4 years’ experience in Python, Scala, or R
  • 4 years’ experience with machine learning
  • 4 years’ experience with SQL

What the JD emphasized

  • petabytes of data

Other signals

  • foundation models
  • embeddings
  • self supervised learning
  • transformer models
  • representation learning
  • customer behavioral models
  • encoder and decoder models