Data Engineer

Apple Apple · Big Tech · Cupertino, CA +1 · Software and Services

Data Engineer role focused on building and maintaining high-volume data processing pipelines, ingestion, and ETL/ELT applications. The role involves applying Generative AI, RAG, and ML for anomaly detection to enhance data analytics capabilities, with a focus on cloud environments using technologies like Kafka, Spark, Flink, Docker, and Kubernetes.

What you'd actually do

  1. Collaborating with data scientists across functional teams to define and enhance performance metrics that provide valuable insights for stakeholders
  2. Building and maintaining: Ingestion pipelines for real-time data processing
  3. Real-time applications driving operational monitoring
  4. Batch ETL/ELT applications populating our data warehouse
  5. Applying Generative AI and Retrieval Augmented Generation (RAG) techniques to enhance data analytics capabilities
  6. Applying Machine Learning technologies for anomaly detection

Skills

Required

  • Bachelor's degree in Computer Science or equivalent professional experience
  • Experience in building large scale distributed systems in Java/Python or similar languages
  • Proficient in SQL
  • Experience with data warehouse architectures and dimensional modeling
  • Demonstrated ability to conduct performance analysis and troubleshoot large scale distributed systems
  • Strong collaboration skills with ability to understand complex architectures and work effectively across teams
  • Hands-on experience with Docker and Kubernetes

Nice to have

  • Production experience with Apache Kafka, Spark, or Flink
  • Working knowledge of Trino or similar distributed query engines
  • Experience building multi-agent AI systems or agentic workflows
  • Familiarity with Retrieval Augmented Generation (RAG) techniques working in conjunction with LLMs
  • Experience with creating and consuming Model Context Protocol (MCP) services

What the JD emphasized

  • very-high-volume processing pipeline
  • next generation of processing pipeline and data analytics platform
  • real-time data processing
  • real-time applications
  • large scale data collection and analytics pipelines
  • large scale distributed systems
  • large scale distributed systems

Other signals

  • Leveraging Generative AI and Machine Learning technologies
  • Applying Generative AI and Retrieval Augmented Generation (RAG) techniques
  • Applying Machine Learning technologies for anomaly detection