Senior Software Engineer - Data Infrastructure

Plaid Plaid · Fintech · San Francisco, CA · All Cost Centers

Senior Software Engineer on the Data Infrastructure team at Plaid, focusing on scaling data systems, improving ML development paths, and evolving data warehousing/lakehouse capabilities. The role involves contributing to the technical roadmap, leading key projects, and mentoring junior engineers.

What you'd actually do

  1. Contribute towards the long-term technical roadmap for data-driven and machine learning iteration at Plaid
  2. Leading key data infrastructure projects such as improving ML development golden paths, implementing offline streaming solutions for data freshness, building net new ETL pipeline infrastructure, and evolving data warehouse or data lakehouse capabilities.
  3. Working with stakeholders in other teams and functions to define technical roadmaps for key backend systems and abstractions across Plaid.
  4. Debugging, troubleshooting, and reducing operational burden for our Data Platform.
  5. Growing the team via mentorship and leadership, reviewing technical documents and code changes.

Skills

Required

  • 5+ years of software engineering experience
  • Extensive hands-on software engineering experience, with a strong track record of delivering successful projects within the Data Infrastructure or Platform domain at similar or larger companies.
  • Deep understanding of one of: ML Infrastructure systems, including Feature Stores, Training Infrastructure, Serving Infrastructure, and Model Monitoring OR Data Infrastructure systems, including Data Warehouses, Data Lakehouses, Apache Spark, Streaming Infrastructure, Workflow Orchestration.
  • Strong cross-functional collaboration, communication, and project management skills, with proven ability to coordinate effectively.
  • Proficiency in coding, testing, and system design, ensuring reliable and scalable solutions.
  • Demonstrated leadership abilities, including experience mentoring and guiding junior engineers.

Nice to have

  • Databricks
  • Airflow
  • AWS EMR

What the JD emphasized

  • ML Infrastructure systems
  • Data Infrastructure systems

Other signals

  • ML development golden paths
  • offline streaming solutions for data freshness
  • evolving data warehouse or data lakehouse capabilities