Senior Machine Learning Engineer, Model Risk Management

Block Block · Fintech · CA · Remote · 11003 Risk - Prod Dev - Square

This role is responsible for independently challenging model owners across lending, fraud, and AML by reproducing results, setting acceptance thresholds, and determining model soundness. The role involves hunting for silent errors, choosing robust evaluation methods, and shipping production code for validation tooling. A key focus is building agentic validation tooling and reasoning about ML systems end-to-end, including modern AI like LLMs and agentic systems, to ensure reliability and compliance with regulatory standards.

What you'd actually do

  1. Independently challenge model owners across lending, fraud, and AML: reproduce their results, set and defend the acceptance thresholds, and own the call on whether a model is sound.
  2. Hunt the silent errors that make metrics lie, and prove them out before they reach production.
  3. Choose evaluation that holds up under real conditions: rare events, shifting populations, and drift that only shows up after launch.
  4. Work hands-on in codebases you did not write, learning the data, configs, and conventions, and ship production code in the tooling you build to validate them.
  5. Build the agentic validation tooling the team depends on, orchestrating agents that run in parallel.

Skills

Required

  • quantitative degree or equivalent experience
  • senior-IC depth building or validating models in a high-stakes domain
  • effective-challenge methodology
  • applied ML and statistics
  • experimentation and statistical rigor
  • production-quality Python
  • SQL on large datasets
  • reproducible, tested code
  • building with LLMs and agentic tools
  • communication to explain and defend conclusions
  • independence to operate under ambiguity

Nice to have

  • model risk management frameworks
  • fair-lending standards

What the JD emphasized

  • senior IC depth building or validating models in a high-stakes domain such as credit, fraud, or financial crime
  • effective-challenge methodology
  • how a model holds up after launch and where its assumptions break
  • Deep applied ML and statistics across model families
  • sound judgment about evaluation, calibration, and generalization
  • Experimentation and statistical rigor
  • Fluency with modern AI: building with LLMs and agentic tools, and the judgment to know when their output can be trusted.

Other signals

  • evaluating AI systems
  • building agentic validation tooling
  • challenging models
  • model risk management