Principal Associate, Data Scientist - L… at Capital One

What you'd actually do

Partner with a cross-functional team of data scientists, software engineers, machine learning engineers and product managers to deliver AI powered products that change how customers interact with their money.

Leverage a broad stack of technologies — Pytorch, AWS Ultraclusters, Hugging Face, LangChain, Lightning, VectorDBs, and more — to reveal the insights hidden within huge volumes of numeric and textual data.

Be the expert in Natural Language Processing (NLP) to harness the power of Large Language Models (LLMs), adapt and finetune them for customer facing applications and features.

Build machine learning and NLP models through all phases of development, from design through training, evaluation, and validation; partnering with engineering teams to operationalize them in scalable and resilient production systems that serve 80+ million customers.

Flex your interpersonal skills to translate the complexity of your work into tangible business goals.

What the JD emphasized

delivering models at scale both in training data and inference volumes

experience in delivering libraries, platforms, or solution level code to existing products

training language models or large computer vision models

expertise in one or more key subdomains such as: training optimization, self-supervised learning, explainability, RLHF

Other signals

LLM Customization team is on the cutting edge of GenAI

AI Training Team touches every aspect of the model development life cycle

deployed models in production drive business impact

build capabilities for evaluating and monitoring generative models

build search, summarization, RAG, and agentic workflows for integration in production applications

Principal Associate, Data Scientist - LLM Customization Team

Data is at the center of everything we do. As a startup, we disrupted the credit card industry by individually personalizing every credit card offer using statistical modeling and the relational database, cutting edge technology in 1988! Fast-forward a few years, and this little innovation and our passion for data has skyrocketed us to a Fortune 200 company and a leader in the world of data-driven decision-making.

As a Data Scientist at Capital One, you’ll be part of a team that’s leading the next wave of disruption at a whole new scale, using the latest in computing and machine learning technologies and operating across billions of customer records to unlock the big opportunities that help everyday people save money, time and agony in their financial lives.

Team Description

The LLM Customization team is on the cutting edge of GenAI and at the center of bringing our vision for AI at Capital One to life. The work of the AI Training Team touches every aspect of the model development life cycle and our deployed models in production drive business impact with visibility from our C-Suite.

Our team creates unprecedented amounts of high quality data for training and testing GenAI models; we care about how it’s created, what’s in those datasets, and the impact they have
We are invested in building capabilities for evaluating and monitoring generative models; these methods must be state of the art, easy to use, and trusted by our users and contributors
Horizontal capabilities enable vertical use case work; the team builds search, summarization, RAG, and agentic workflows for integration in production applications across the company
We learn from our colleagues, attend conferences, publish papers, and maintain strong connections to the research community.

In this role, you will:

Partner with a cross-functional team of data scientists, software engineers, machine learning engineers and product managers to deliver AI powered products that change how customers interact with their money.
Leverage a broad stack of technologies — Pytorch, AWS Ultraclusters, Hugging Face, LangChain, Lightning, VectorDBs, and more — to reveal the insights hidden within huge volumes of numeric and textual data.
Be the expert in Natural Language Processing (NLP) to harness the power of Large Language Models (LLMs), adapt and finetune them for customer facing applications and features.
Build machine learning and NLP models through all phases of development, from design through training, evaluation, and validation; partnering with engineering teams to operationalize them in scalable and resilient production systems that serve 80+ million customers.
Flex your interpersonal skills to translate the complexity of your work into tangible business goals.

The Ideal Candidate is:

Customer first. You love the process of analyzing and creating, but also share our passion to do the right thing. You know at the end of the day it’s about making the right decision for our customers.
Innovative. You continually research and evaluate emerging technologies. You stay current on published state-of-the-art methods, technologies, and applications and seek out opportunities to apply them.
Creative. You thrive on bringing definition to big, undefined problems. You love asking questions and pushing hard to find answers. You’re not afraid to share a new idea.
A leader. You challenge conventional thinking and work with stakeholders to identify and improve the status quo. You're passionate about talent development for your own team and beyond.
Technical. You’re comfortable with advanced ML and DL technologies including language models and are passionate about developing further. You have hands-on experience working with LLMs and solutions using open-source tools and cloud computing platforms.
Influential. You are passionate about AI/ML and can bring along a cross functional team in breakthrough innovations. You communicate clearly and effectively to share your findings with non-technical audiences.
You are experienced in training language models or large computer vision models as well as have expertise in one or more key subdomains such as: training optimization, self-supervised learning, explainability, RLHF.
You have an engineering mindset as shown by a track record of delivering models at scale both in training data and inference volumes. You have experience in delivering libraries, platforms, or solution level code to existing products.

Basic Qualifications:

Currently has, or is in the process of obtaining one of the following with an expectation that the required degree will be obtained on or before the scheduled start date:
- A Bachelor's Degree in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field) plus 5 years of experience performing data analytics
- A Master's Degree in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field) or an MBA with a quantitative concentration plus 3 years of experience performing data analytics
- A PhD in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field)

Preferred Qualifications:

Master’s Degree in “STEM” field (Science, Technology, Engineering, or Mathematics) plus 3 years of experience in data analytics, or PhD in “STEM” field (Science, Technology, Engineering, or Mathematics)
At least 1 year of experience working with AWS
At least 3 years’ experience in Python, Scala, or R
At least 3 years’ experience with machine learning
At least 3 years’ experience with SQL

Capital One will consider sponsoring a new qualified applicant for employment authorization for this position.

The minimum and maximum full-time annual salaries for this role are listed below, by location. Please note that this salary information is solely for candidates hired to perform work within one of these locations, and refers to the amount Capital One is willing to pay at the time of this posting. Salaries for part-time roles will be prorated based upon the agreed upon number of hours to be regularly worked.

McLean, VA: $161,800 - $184,600 for Princ Associate, Data Science

New York, NY: $176,500 - $201,400 for Princ Associate, Data Science

San Jose, CA: $176,500 - $201,400 for Princ Associate, Data Science

Candidates hired to work in other locations will be subject to the pay range associated with that location, and the actual annualized salary amount offered to any candidate at the time of hire will be reflected solely in the candidate’s offer letter.

This role is also eligible to earn performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI). Incentives could be discretionary or non discretionary depending on the plan.

Capital One offers a comprehensive, competitive, and inclusive set of health, financial and other benefits that support your total well-being. Learn more at the Capital One Careers website. Eligibility varies based on full or part-time status, exempt or non-exempt status, and management level.

This role is expected to accept applications for a minimum of 5 business days.

No agencies please. Capital One is an equal opportunity employer (EOE, including disability/vet) committed to non-discrimination in compliance with applicable federal, state, and local laws. Capital One promotes a drug-free workplace. Capital One will consider for employment qualified applicants with a criminal history in a manner consistent with the requirements of applicable laws regarding criminal background inquiries, including, to the extent applicable, Article 23-A of the New York Correction Law; San Francisco, California Police Code Article 49, Sections 4901-4920; New York City’s Fair Chance Act; Philadelphia’s Fair Criminal Records Screening Act; and other applicable federal, state, and local laws and regulations regarding criminal background inquiries.

If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation, please contact Capital One Recruiting at 1-800-304-9102 or via email at RecruitingAccommodation@capitalone.com. All information you provide will be kept confidential and will be used only to the extent required to provide needed reasonable accommodations.

For technical support or questions about Capital One's recruiting process, please send an email to Careers@capitalone.com

Capital One does not provide, endorse nor guarantee and is not liable for third-party products, services, educational tools or other information available through this site.

Capital One Financial is made up of several different entities. Please note that any position posted in Canada is for Capital One Canada, any position posted in the United Kingdom is for Capital One Europe and any position posted in the Philippines is for Capital One Philippines Service Corp. (COPSSC).

Principal Associate, Data Scientist - LLM Customization Team

Team Description

Our team creates unprecedented amounts of high quality data for training and testing GenAI models; we care about how it’s created, what’s in those datasets, and the impact they have
We are invested in building capabilities for evaluating and monitoring generative models; these methods must be state of the art, easy to use, and trusted by our users and contributors
Horizontal capabilities enable vertical use case work; the team builds search, summarization, RAG, and agentic workflows for integration in production applications across the company
We learn from our colleagues, attend conferences, publish papers, and maintain strong connections to the research community.

In this role, you will:

Partner with a cross-functional team of data scientists, software engineers, machine learning engineers and product managers to deliver AI powered products that change how customers interact with their money.
Leverage a broad stack of technologies — Pytorch, AWS Ultraclusters, Hugging Face, LangChain, Lightning, VectorDBs, and more — to reveal the insights hidden within huge volumes of numeric and textual data.
Be the expert in Natural Language Processing (NLP) to harness the power of Large Language Models (LLMs), adapt and finetune them for customer facing applications and features.
Build machine learning and NLP models through all phases of development, from design through training, evaluation, and validation; partnering with engineering teams to operationalize them in scalable and resilient production systems that serve 80+ million customers.
Flex your interpersonal skills to translate the complexity of your work into tangible business goals.

The Ideal Candidate is:

Customer first. You love the process of analyzing and creating, but also share our passion to do the right thing. You know at the end of the day it’s about making the right decision for our customers.
Innovative. You continually research and evaluate emerging technologies. You stay current on published state-of-the-art methods, technologies, and applications and seek out opportunities to apply them.
Creative. You thrive on bringing definition to big, undefined problems. You love asking questions and pushing hard to find answers. You’re not afraid to share a new idea.
A leader. You challenge conventional thinking and work with stakeholders to identify and improve the status quo. You're passionate about talent development for your own team and beyond.
Technical. You’re comfortable with advanced ML and DL technologies including language models and are passionate about developing further. You have hands-on experience working with LLMs and solutions using open-source tools and cloud computing platforms.
Influential. You are passionate about AI/ML and can bring along a cross functional team in breakthrough innovations. You communicate clearly and effectively to share your findings with non-technical audiences.
You are experienced in training language models or large computer vision models as well as have expertise in one or more key subdomains such as: training optimization, self-supervised learning, explainability, RLHF.
You have an engineering mindset as shown by a track record of delivering models at scale both in training data and inference volumes. You have experience in delivering libraries, platforms, or solution level code to existing products.

Basic Qualifications:

Currently has, or is in the process of obtaining one of the following with an expectation that the required degree will be obtained on or before the scheduled start date:
- A Bachelor's Degree in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field) plus 5 years of experience performing data analytics
- A Master's Degree in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field) or an MBA with a quantitative concentration plus 3 years of experience performing data analytics
- A PhD in a quantitative field (Statistics, Economics, Operations Research, Analytics, Mathematics, Computer Science, or a related quantitative field)

Preferred Qualifications:

Master’s Degree in “STEM” field (Science, Technology, Engineering, or Mathematics) plus 3 years of experience in data analytics, or PhD in “STEM” field (Science, Technology, Engineering, or Mathematics)
At least 1 year of experience working with AWS
At least 3 years’ experience in Python, Scala, or R
At least 3 years’ experience with machine learning
At least 3 years’ experience with SQL

Capital One will consider sponsoring a new qualified applicant for employment authorization for this position.

McLean, VA: $161,800 - $184,600 for Princ Associate, Data Science

New York, NY: $176,500 - $201,400 for Princ Associate, Data Science

San Jose, CA: $176,500 - $201,400 for Princ Associate, Data Science

This role is expected to accept applications for a minimum of 5 business days.

For technical support or questions about Capital One's recruiting process, please send an email to Careers@capitalone.com

Capital One does not provide, endorse nor guarantee and is not liable for third-party products, services, educational tools or other information available through this site.

Principal Associate, Data Scientist - LLM Customization Team

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals