Software Developer II - Data Platform

Redfin Redfin · Seattle · Seattle, WA

Redfin is seeking a Software Developer II - Data Platform to design, build, and maintain scalable data pipelines and services. This role involves ingesting, processing, and delivering large-scale datasets for machine learning, analytics, and product teams, using technologies like Spark, Python/Java, and Airflow. The developer will also lead modernization efforts and collaborate with cross-functional teams.

What you'd actually do

  1. Design, build, and maintain scalable data pipelines that ingest, process, and organize large datasets (such as listings, clickstream, and external data sources) into Redfin’s data lake and analytics platforms.
  2. Develop and operate distributed data processing applications using technologies such as Spark, Python/Java, and workflow orchestration tools (e.g., Airflow/Windfarm) that power machine learning, product features, and analytics.
  3. Take ownership of data pipelines and platform services, ensuring reliability, performance, and data quality across Redfin’s data ecosystem.
  4. Lead the development and modernization of data pipelines by migrating legacy systems to Redfin’s lakehouse architecture and standardized platform frameworks.
  5. Collaborate with engineers, data scientists, and product teams to design and deliver datasets and data services that enable new product capabilities and machine learning models.

Skills

Required

  • 3-5 years of experience building software systems, data pipelines, or backend services in production environments
  • Experience developing large-scale applications backed by relational and non-relational databases
  • Experience working with distributed data processing technologies such as Spark, Kafka, or similar systems
  • Designing and implementing data pipelines and services that operate reliably at scale
  • Collaborating effectively with engineers, data scientists, and product teams
  • Data quality
  • System reliability
  • Building maintainable systems

Nice to have

  • Curious and proactive about learning new technologies
  • Continuously improving Redfin’s data platform and engineering practices

What the JD emphasized

  • large-scale datasets
  • machine learning
  • analytics
  • product teams
  • Spark
  • Python/Java
  • Airflow
  • cloud data platforms
  • data lake
  • lakehouse architecture
  • scalable data pipelines
  • distributed data processing applications
  • data quality
  • system reliability
  • maintainable systems