Data Integration Engineer

Apple Apple · Big Tech · Hyderabad, India · Software and Services

Data Integration Engineer responsible for building and maintaining data pipelines for structured and unstructured data to support AIML model development and deployment. Focuses on data infrastructure, data quality, and integration for sales processes.

What you'd actually do

  1. Responsible for the development and design of data integrations and data ingestion processes for Apple internal and external data.
  2. Build and maintain data pipelines for ingesting, processing, and transforming unstructured data sources, such as customer feedback, social media data, or sales call recordings.
  3. Develop data quality monitoring and validation processes specifically for AIML datasets, including identifying and addressing data bias.
  4. Work with data scientists to understand data requirements for AIML model training and deployment, ensuring data is available in the appropriate format and quality.
  5. Play an active role in the development and maintenance of user documentation, including data models, mapping rules, and data dictionaries.

Skills

Required

  • 5+ years of experience in designing, building, and maintaining scalable data solutions for large-scale analytics.
  • Proficiency in SQL and development experience with cloud database environments like Snowflake, Redshift or Databricks.
  • Proficiency in programming languages like Python, Java or R and open-source frameworks for distributed processing like Hadoop and Spark.
  • Hands on experience using development tools in a modern cloud data stack for code management, versioning using Git, CI/CD tools, automation and orchestration using Apache Airflow or others and monitoring & alerting.
  • Architecting and developing data pipelines through ETL tools, API integrations with systematic and cloud based source systems.
  • Strong understanding of data modeling, data warehousing, and ETL concepts
  • Experience with Cloud platforms AWS, Azure, Google Cloud
  • Handing unstructured data (e.g., JSON, Parquet, text, images, audio, video).
  • Experience with data governance and observability tools (e.g., Datahub, Collibra, etc…)
  • Hands-on experience using dbt for transforming data in a cloud data warehouse.
  • Experience building and maintaining dbt models, tests, and documentation
  • Understanding of writing dbt macros and Jinja templating
  • Experience articulating and translating business questions into data solutions and proven ability to lead development projects from start to finish.
  • Experience and understanding of API development (REST, GraphQL, gRPC).
  • Broad knowledge of web standards relating to REST, HTTP, JSON, etc.
  • Experience with basic frontend dev (HTML, CSS, JavaScript, Bootstrap, JQuery, etc).
  • Experience with data labeling and annotation tools and processes.
  • Familiarity with AI/ML model development lifecycle and data needs for training and deployment.
  • Able to balance competing priorities, long-term projects, and ad hoc requirements.
  • Ability to work in a fast-paced, dynamic, constantly evolving business environment.

Nice to have

  • Jupyter Labs, Dataiku experience is a plus
  • Familiarity with dbt best practices (modular models, sources, refs, macros)
  • BS or MS in Computer Science or equivalent industry experience.

What the JD emphasized

  • building data infrastructure to support AIML initiatives
  • building and maintaining data pipelines for both structured and unstructured data, enabling the development and deployment of AIML models
  • Develop data quality monitoring and validation processes specifically for AIML datasets, including identifying and addressing data bias
  • Work with data scientists to understand data requirements for AIML model training and deployment

Other signals

  • building data infrastructure to support AIML initiatives
  • building and maintaining data pipelines for both structured and unstructured data, enabling the development and deployment of AIML models
  • Develop data quality monitoring and validation processes specifically for AIML datasets, including identifying and addressing data bias
  • Work with data scientists to understand data requirements for AIML model training and deployment