Senior Software Engineer - ML Infrastructure

Plaid · Fintech · San Francisco, CA · Engineering

Plaid is seeking a Senior Software Engineer to build and operate the ML infrastructure that powers their AI-first strategy. This role focuses on designing and implementing scalable, reliable, and secure ML platforms, including feature stores, pipelines, deployment tooling, and inference systems, to accelerate the delivery of AI-powered financial products.

What you'd actually do

  1. Design and implement large-scale ML infrastructure, including feature stores, pipelines, deployment tooling, and inference systems.
  2. Drive the rollout of Plaid’s next-generation feature store to improve reliability and velocity of model development.
  3. Help define and evangelize an ML Ops “golden path” for secure, scalable model training, deployment, and monitoring.
  4. Ensure operational excellence of ML pipelines and services, including reliability, scalability, performance, and cost efficiency.
  5. Collaborate with ML product teams to understand requirements and deliver solutions that accelerate experimentation and iteration.

Skills

Required

  • 5+ years of industry experience as a software engineer, with strong focus on ML/AI infrastructure or large-scale distributed systems.
  • Hands-on expertise in building and operating ML platforms (e.g., feature stores, data pipelines, training/inference frameworks).
  • Proven experience delivering reliable and scalable infrastructure in production.
  • Solid understanding of ML Ops concepts and tooling, as well as best practices for observability, security, and reliability.
  • Strong communication skills and ability to collaborate across teams.

Nice to have

  • Experience with ML Ops tools such as MLFlow, SageMaker, or model registries.
  • Exposure to modern AI infrastructure environments (LLMs, real-time inference, agentic models).
  • Background in scaling ML infrastructure in fast-paced product environments.

What the JD emphasized

  • ML Infrastructure
  • feature stores
  • pipelines
  • deployment tooling
  • inference systems
  • ML Ops golden path
  • scalable
  • reliable
  • secure

Other signals

  • ML Infrastructure
  • feature stores
  • pipelines
  • deployment tooling
  • inference systems
  • ML Ops golden path