Senior Manager, Software Engineering, (ml Platform)

Affirm Affirm · Fintech · United States · Remote · Checkout

Senior Manager to lead ML Platform engineering organization, building and operating critical infrastructure for ML capabilities at Affirm. Focus on real-time and batch feature computation, model training, and model serving at scale, including transformer-based models and GPU compute.

What you'd actually do

  1. Own the technical strategy and roadmap for ML Platform, covering real-time and batch feature computation, model training infrastructure, and model serving at scale.
  2. Lead and grow a team of engineering managers, while staying hands-on with the technical direction and maintaining close partnership with ICs.
  3. Continuously evolve the platform to stay ahead of the frontier — anticipating where AI and ML are heading and building the infrastructure that makes those capabilities possible at Affirm before they become urgent needs. This includes large-scale training and serving of transformer-based models, GPU compute, reinforcement learning, and whatever comes next.
  4. Partner with ML modeling, product, and infrastructure leadership to ensure the platform accelerates Affirm's most critical ML initiatives.
  5. Establish engineering excellence across the organization: reliability, observability, developer experience, and operational rigor.

Skills

Required

  • Software engineering management
  • ML infrastructure
  • Large-scale systems
  • Distributed systems
  • Transformer architectures
  • GPU compute
  • Reinforcement learning
  • Data pipelines
  • Model serving
  • Training infrastructure

Nice to have

  • Applied ML modeling experience

What the JD emphasized

  • lead our ML Platform engineering organization
  • builds and operates the critical infrastructure enabling every ML capability at Affirm
  • technically demanding role at the intersection of distributed systems, modern AI, and platform engineering
  • manage a team of engineering managers
  • stay deeply connected to the engineers building the platform
  • continuously evolve the platform to stay ahead of the frontier
  • building the infrastructure that makes those capabilities possible at Affirm before they become urgent needs
  • large-scale training and serving of transformer-based models
  • GPU compute
  • reinforcement learning
  • Partner with ML modeling, product, and infrastructure leadership
  • ensure the platform accelerates Affirm's most critical ML initiatives
  • Establish engineering excellence across the organization
  • reliability, observability, developer experience, and operational rigor
  • 12+ years of industry experience in software and/or machine learning engineering
  • significant hands-on software engineering experience
  • 4+ years managing engineering managers
  • Deep expertise in building and operating large-scale ML infrastructure
  • feature stores, model serving systems, training pipelines, or equivalent
  • Strong understanding of data
  • data pipelines, data quality, and how data shapes model behavior and platform design
  • Fluency with modern ML
  • deep neural networks, transformer architectures, reinforcement learning, large-scale GPU training and serving
  • Strong systems thinking
  • comfortable reasoning from low-level infrastructure decisions to broad architectural trade-offs
  • Track record of building platforms that meaningfully accelerate the productivity and impact of ML teams
  • Experience on the applied ML modeling side is a plus
  • understanding how models are built makes you a better platform builder
  • Experience navigating ambiguity and leading through organizational complexity

Other signals

  • ML Platform
  • model training infrastructure
  • model serving at scale
  • large-scale training and serving of transformer-based models
  • GPU compute