Software Engineer Ii, Big Data, Tvscientific

Pinterest Pinterest · Consumer · San Francisco, CA · tvScientific

Software Engineer II, Big Data at Pinterest's tvScientific, focusing on building and scaling robust data infrastructure and pipelines using Spark with Scala in AWS. The role involves designing data solutions, implementing knowledge graphs, and optimizing AWS resources, with collaboration with Data Science and Product teams. While the company emphasizes AI, this role is primarily data engineering focused on supporting AI initiatives.

What you'd actually do

  1. Design and implement robust data infrastructure in AWS, using Spark with Scala
  2. Evolve our core data pipelines to efficiently scale for our massive growth
  3. Store data in optimal engines and formats, matching your designs to our performance needs and cost factors
  4. Collaborate with our cross-functional teams to design data solutions that meet business needs
  5. Design and implement knowledge graphs, exposing their functionality both via Batch Processing and APIs

Skills

Required

  • Spark
  • Scala
  • AWS
  • SQL
  • data lakes
  • cloud warehouses
  • storage formats
  • APIs

Nice to have

  • adtech
  • data governance
  • data quality
  • metadata management
  • access controls
  • privacy-by-design
  • sensitive or regulated data handling
  • Apache Iceberg
  • Delta
  • building out a Data Engineering function
  • machine learning pipelines

What the JD emphasized

  • Production data engineering experience
  • Proficiency in Spark and Scala, with proven experience building data infrastructure in Spark using Scala is preferred
  • Experience in delivering significant technical initiatives and building reliable, large scale services
  • Experience in delivering APIs backed by relationship-heavy datasets
  • Strong proficiency in AWS services
  • Expertise in SQL for data manipulation and extraction
  • Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs
  • Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)
  • High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables