What you'd actually do

Independently challenge model owners across lending, fraud, and AML: reproduce their results, set and defend the acceptance thresholds, and own the call on whether a model is sound.

Hunt the silent errors that make metrics lie, and prove them out before they reach production.

Choose evaluation that holds up under real conditions: rare events, shifting populations, and drift that only shows up after launch.

Work hands-on in codebases you did not write, learning the data, configs, and conventions, and ship production code in the tooling you build to validate them.

Build the agentic validation tooling the team depends on, orchestrating agents that run in parallel.

Skills

Required

quantitative degree or equivalent experience
senior-IC depth building or validating models in a high-stakes domain
effective-challenge methodology
applied ML and statistics
experimentation and statistical rigor
production-quality Python
SQL on large datasets
reproducible, tested code
building with LLMs and agentic tools
communication to explain and defend conclusions
independence to operate under ambiguity

Nice to have

model risk management frameworks
fair-lending standards

What the JD emphasized

senior IC depth building or validating models in a high-stakes domain such as credit, fraud, or financial crime

effective-challenge methodology

how a model holds up after launch and where its assumptions break

Deep applied ML and statistics across model families

sound judgment about evaluation, calibration, and generalization

Experimentation and statistical rigor

Fluency with modern AI: building with LLMs and agentic tools, and the judgment to know when their output can be trusted.

Block builds simple, powerful tools that make progress towards an economy that’s truly open to all. Each of our brands unlocks different aspects of the economy for more people. Square makes commerce and financial services accessible to sellers. Cash App is the easy way to spend, send, and store money. Afterpay is transforming the way customers manage their spending over time. TIDAL is a music platform that empowers artists to thrive as entrepreneurs. Bitkey is a simple self-custody wallet built for bitcoin. Proto is a suite of bitcoin mining products and services. Together, we’re helping build a financial system that is open to everyone. Join us.

The Role

Block lends, moves money, and screens for financial crime at enormous scale, and one bad model can mean millions in credit losses, suspicious activity that goes unreported, or a fair lending violation. Model Risk Management is the independent function that decides whether a model is sound enough to put in front of customers and regulators.

The failures that matter rarely announce themselves: a model can clear every headline metric and still be broken underneath. It can pass clean at launch and then quietly drift as the population shifts, until the loss it was supposed to prevent surfaces months later. The hard part is finding what looks right and is wrong, then proving it well enough to hold up under questioning. Much of the work arrives under-specified, so you scope it into a defensible plan, ask the questions that surface the real requirements, and defend your tradeoffs to the people who built the model you are challenging.

The same scrutiny you apply to models applies to AI. We build the tooling that lets a lean team validate at scale, so you critically evaluate what it produces and own the evaluation that confirms its output is reliable enough to act on. That work matters most for the GenAI and agentic systems most teams have not figured out how to oversee yet.

As a senior individual contributor, you lead through technical depth and cross-team scope, and you partner widely across the organization. You work with the first-line modelers you challenge, the Legal, Compliance, and fair-lending teams who rely on your analysis, and the auditors and bank partners who carry it into regulatory engagements. This role is remote-friendly within approved US locations.

You Will

Independently challenge model owners across lending, fraud, and AML: reproduce their results, set and defend the acceptance thresholds, and own the call on whether a model is sound.
Hunt the silent errors that make metrics lie, and prove them out before they reach production.
Choose evaluation that holds up under real conditions: rare events, shifting populations, and drift that only shows up after launch.
Work hands-on in codebases you did not write, learning the data, configs, and conventions, and ship production code in the tooling you build to validate them.
Build the agentic validation tooling the team depends on, orchestrating agents that run in parallel.
Reason about ML systems end to end — how features, training, serving, monitoring, and scale fit together — to evaluate and challenge an owner's design.
Tie explainability and fair-lending findings on consumer credit models back to the model and product decisions that follow.
Help define how Block validates the systems at the frontier of production AI, setting standards where none exist yet.

You Have

A quantitative degree or equivalent experience, and senior-IC depth building or validating models in a high-stakes domain such as credit, fraud, or financial crime.
Command of effective-challenge methodology: reproduction, conceptual-soundness review, benchmarking, stress testing, and outcomes analysis, with an eye for how a model holds up after launch and where its assumptions break.
Deep applied ML and statistics across model families, from regression and tree ensembles to deep learning, with sound judgment about evaluation, calibration, and generalization.
Experimentation and statistical rigor: holdout and experiment design, reasoning about uncertainty, and evaluating a model beyond aggregate accuracy.
Solid software and data engineering: production-quality Python, SQL on large datasets, and reproducible, tested code.
Fluency with modern AI: building with LLMs and agentic tools, and the judgment to know when their output can be trusted.
Familiarity with model risk management frameworks and fair-lending standards, with the specifics learnable on the job.
The communication to explain and defend your conclusions to model owners and senior stakeholders, and the independence to operate under ambiguity.

Technologies We Use and Teach

Python (NumPy, Pandas, scikit-learn, LightGBM, XGBoost, PyTorch)
AI dev tools: Claude Code, Cursor, Copilot; agent skills and frameworks for building LLM-powered tooling
MLflow / Databricks; Prefect on GCP Vertex AI
Snowflake and cloud object storage
GitHub and CI (ruff, pytest)
Jira and Linear
GCP and AWS

We're working to build a more inclusive economy where our customers have equal access to opportunity, and we strive to live by these same values in building our workplace. Block is an equal opportunity employer evaluating all employees and job applicants without regard to identity or any legally protected class. We will consider qualified applicants with arrest or conviction records for employment in accordance with state and local laws and “fair chance” ordinances.

We believe in being fair, and are committed to an inclusive interview experience, including providing reasonable accommodations to disabled applicants throughout the recruitment process. We encourage applicants to share any needed accommodations with their recruiter, who will treat these requests as confidentially as possible. Want to learn more about what we're doing to build a workplace that is fair and square? Check out our I+D page.

Block takes a market-based approach to pay, and pay may vary depending on your location. U.S. locations are categorized into one of four zones based on a cost of labor index for that geographic area. The successful candidate’s starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. These ranges may be modified in the future.

To find a location’s zone designation, please refer to this resource. If a location of interest is not listed, please speak with a recruiter for additional information.

Zone A:

$228,700—$343,100 USD

Zone B:

$217,300—$325,900 USD

Zone C:

$205,900—$308,900 USD

Zone D:

$194,500—$291,700 USD

Application Guidelines

Candidates may submit up to 9 active applications within a 60-day period. Reapplications to the same role are accepted 90 days after a previous application has been reviewed.

Use of AI in Our Hiring Process

We may use automated AI tools to evaluate job applications for efficiency and consistency. These tools comply with local regulations, including bias audits, and we handle all personal data in accordance with state and local privacy laws.

_Every benefit we offer is designed with one goal: empowering you to do the best work of your career while building the life you want. Remote work, medical insurance, flexible time off, retirement savings plans, and modern family planning are just some of our offering. _Check out our other benefits at Block.

Block, Inc. (NYSE: XYZ) builds technology to increase access to the global economy. Each of our brands unlocks different aspects of the economy for more people. Square makes commerce and financial services accessible to sellers. Cash App is the easy way to spend, send, and store money. Afterpay is transforming the way customers manage their spending over time. TIDAL is a music platform that empowers artists to thrive as entrepreneurs. Bitkey is a simple self-custody wallet built for bitcoin. Proto is a suite of bitcoin mining products and services. Together, we’re helping build a financial system that is open to everyone.

The Role

You Will

Independently challenge model owners across lending, fraud, and AML: reproduce their results, set and defend the acceptance thresholds, and own the call on whether a model is sound.
Hunt the silent errors that make metrics lie, and prove them out before they reach production.
Choose evaluation that holds up under real conditions: rare events, shifting populations, and drift that only shows up after launch.
Work hands-on in codebases you did not write, learning the data, configs, and conventions, and ship production code in the tooling you build to validate them.
Build the agentic validation tooling the team depends on, orchestrating agents that run in parallel.
Reason about ML systems end to end — how features, training, serving, monitoring, and scale fit together — to evaluate and challenge an owner's design.
Tie explainability and fair-lending findings on consumer credit models back to the model and product decisions that follow.
Help define how Block validates the systems at the frontier of production AI, setting standards where none exist yet.

You Have

A quantitative degree or equivalent experience, and senior-IC depth building or validating models in a high-stakes domain such as credit, fraud, or financial crime.
Command of effective-challenge methodology: reproduction, conceptual-soundness review, benchmarking, stress testing, and outcomes analysis, with an eye for how a model holds up after launch and where its assumptions break.
Deep applied ML and statistics across model families, from regression and tree ensembles to deep learning, with sound judgment about evaluation, calibration, and generalization.
Experimentation and statistical rigor: holdout and experiment design, reasoning about uncertainty, and evaluating a model beyond aggregate accuracy.
Solid software and data engineering: production-quality Python, SQL on large datasets, and reproducible, tested code.
Fluency with modern AI: building with LLMs and agentic tools, and the judgment to know when their output can be trusted.
Familiarity with model risk management frameworks and fair-lending standards, with the specifics learnable on the job.
The communication to explain and defend your conclusions to model owners and senior stakeholders, and the independence to operate under ambiguity.

Technologies We Use and Teach

Python (NumPy, Pandas, scikit-learn, LightGBM, XGBoost, PyTorch)
AI dev tools: Claude Code, Cursor, Copilot; agent skills and frameworks for building LLM-powered tooling
MLflow / Databricks; Prefect on GCP Vertex AI
Snowflake and cloud object storage
GitHub and CI (ruff, pytest)
Jira and Linear
GCP and AWS

To find a location’s zone designation, please refer to this resource. If a location of interest is not listed, please speak with a recruiter for additional information.

Zone A:

$228,700—$343,100 USD

Zone B:

$217,300—$325,900 USD

Zone C:

$205,900—$308,900 USD

Zone D:

$194,500—$291,700 USD

Application Guidelines

Candidates may submit up to 9 active applications within a 60-day period. Reapplications to the same role are accepted 90 days after a previous application has been reviewed.

Use of AI in Our Hiring Process

Senior Machine Learning Engineer, Model Risk Management

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

The Role

You Will

You Have

Technologies We Use and Teach

The Role

You Will

You Have

Technologies We Use and Teach