Americas Business Process Re-engineering Data Engineer

Apple Apple · Big Tech · Austin, TX · Software and Services

Data Engineer role focused on building and maintaining scalable data infrastructure for analytics, machine learning, and AI-driven decision-making within Operations. The role involves designing data pipelines, building data models, implementing data observability, and collaborating with data science and ML engineering teams. It also emphasizes leveraging AI-assisted tools for development and researching emerging GenAI data tooling.

What you'd actually do

  1. Engage with business and analytics teams to deeply understand data needs and translate requirements into robust, scalable engineering solutions that directly impact Operations decisions
  2. Design and implement end-to-end data pipelines and architectures from ingestion and transformation to delivery across batch and real-time streaming workloads
  3. Build and maintain high-quality data models (dimensional, relational, or knowledge graph-based) using modern transformation frameworks such as dbt, powering analytics and AIML use cases at scale
  4. Architect and operate data workflows using orchestration tools (e.g., Apache Airflow, etc) with built-in monitoring, alerting, and SLA management
  5. Implement data observability, lineage tracking, and validation frameworks to uphold data integrity and trustworthiness across the platform

Skills

Required

  • Python
  • SQL
  • dbt
  • Spark
  • Kafka/Flink
  • Snowflake
  • Delta Lake
  • Apache Iceberg
  • Apache Airflow
  • Docker
  • Kubernetes
  • GenAI tooling
  • Agentic AI tooling
  • LLM-assisted code generation
  • Vector databases
  • RAG pipelines
  • Data visualization
  • Self-service analytics platforms

Nice to have

  • MS in Computer Science, Data Engineering, Statistics, Applied Math, Data Science, Operations Research
  • Tableau
  • Streamlit
  • ThoughtSpot

What the JD emphasized

  • 8+ years of industry experience OR BS in related field with 10+ years hands-on industry experience
  • Domain expertise in supply chain, operations management, logistics, planning & forecasting, production integration, channel management
  • Demonstrated expertise building and operating large-scale ETL/ELT pipelines using Python, SQL, and modern frameworks (dbt, Spark, Kafka/Flink for streaming)
  • Proficiency with cloud data platforms (e.g. Snowflake) and open table formats (Delta Lake, Apache Iceberg)
  • Strong command of advanced SQL for complex data modeling, query optimization, and analytics engineering
  • Experience with workflow orchestration tools (Apache Airflow or equivalent) and building production-grade, monitored pipelines
  • Hands-on experience implementing data quality frameworks, observability tooling, and data lineage tracking in production environments
  • Experienced with implementation and productionalization of GenAI and Agentic AI tooling including LLM-assisted code generation, MCP servers, and AI-powered data pipeline automation
  • Track record of staying current with industry best practices, rapidly adopting emerging technologies (e.g., vector databases, RAG pipelines, AI-native data tools), and building functional prototypes to validate concepts

Other signals

  • designing and building modern, scalable data infrastructure that powers analytics, machine learning, and AI-driven decision-making
  • operationalize models and ensure data infrastructure supports production AIML workflows
  • Leverage AI-assisted development tools (e.g., GitHub, Claude) and LLM-powered agents to accelerate pipeline authoring, code review, documentation, and transformation logic generation from natural language specifications
  • Research and evaluate emerging data engineering technologies including streaming architectures, GenAI-powered data tooling, and next-generation warehousing to expand the team’s capabilities and accelerate innovation