Aiml - Sr Software Data Engineer, Evaluation

Apple Apple · Big Tech · Cupertino, CA · Machine Learning and AI

This role is for a Sr. Software Data Engineer on the Evaluation Data Engineering team at Apple. The team builds and maintains a scalable data platform that supports Siri, Search, and Machine Learning. The engineer will design and optimize large-scale stream and batch processing data pipelines using technologies like Flink and Spark, focusing on data quality and performance to enable ML applications and product insights. The role requires significant experience in distributed data processing systems and full-stack development.

What you'd actually do

  1. builds the scalable and reliable data platform that powers Siri, Search, and Machine Learning across Apple
  2. design a unified and groundbreaking data processing framework using Flink, and/or Spark
  3. optimizing performance, ensuring data quality, and contributing to a long-term vision that extends the framework’s capabilities to new user scenarios and groundbreaking machine learning applications
  4. transform raw data into datasets that drive innovation
  5. automate dataset lifecycles with strong quality standards

Skills

Required

  • designing, building, and maintaining distributed data processing systems at scale
  • stream and/or batch processing technologies such as Flink, Spark, Kafka, Airflow, Iceberg, and Trino
  • full-stack development
  • Java
  • Scala
  • Python
  • MS or BS in Computer Science, Engineering, Math, Statistics, or a related field, or equivalent practical experience in data engineering

Nice to have

  • algorithms
  • data structures
  • data modeling
  • SQL
  • large-scale, complex, and high-dimensional datasets
  • machine learning algorithms or pipelines, particularly in the context of data engineering
  • supporting ML engineers or data scientists with feature engineering or model data pipelines
  • testing tools and methodologies for validating large-scale, distributed data systems
  • design
  • testing
  • version control
  • CI/CD best practices
  • working independently in a fast-paced, ambiguous environment
  • communication
  • problem-solving skills

What the JD emphasized

  • 7+ years of experience designing, building, and maintaining distributed data processing systems at scale
  • 5+ years of hands-on experience with stream and/or batch processing technologies such as Flink, Spark, Kafka, Airflow, Iceberg, and Trino
  • 2-3 years of experience in full-stack development