Data Scientist, Ai/ml Model Quality

Apple Apple · Big Tech · Austin, TX +3 · Software and Services

This role focuses on ensuring the quality of data used for training and evaluating AI/ML models, particularly in Generative AI systems within the Wallet, Payments, and Commerce domains. The Data Scientist will build and maintain intelligent systems, validation frameworks, and monitoring pipelines to ensure data integrity and model trustworthiness. Responsibilities include curating ground-truth datasets, auditing training data for bias, defining data quality metrics, integrating automated checks, and analyzing telemetry for GenAI workflows to identify failure modes and provide recommendations.

What you'd actually do

  1. Curate, analyze, and maintain gold-standard ground-truth datasets for model evaluation and continuous validation across both ML and GenAI systems.
  2. Audit training data for systemic bias and fairness gaps prior to model deployment; establish ongoing analytical checks to catch bias introduced by data drift over time.
  3. Define, track, and report key data quality metrics — completeness, accuracy, timeliness, validity — for engineering and leadership audiences.
  4. Design and define automated data quality rules and thresholds, partnering with Data Engineering to ensure these checks are integrated into model development and CI/CD workflows
  5. Define and own ML observability metrics — model performance, output distributions, training-serving skew, silent degradation and feature drift — translating raw production signals into actionable insights for engineering and product teams.

Skills

Required

  • Python (Pandas, NumPy, Scikit-learn)
  • SQL
  • complex data analysis
  • metric creation
  • validation
  • querying and analyzing large-scale datasets
  • distributed computing frameworks (e.g., PySpark, Spark, or distributed SQL)
  • statistical methods — hypothesis testing, distribution analysis, data drift detection, and statistical process control
  • defining and tracking ML model health metrics in production
  • GenAI or LLM systems
  • communication skills

Nice to have

  • data visualization and dashboarding tools (e.g., Tableau, Apache Superset, Databricks)
  • LLM evaluation frameworks (e.g. LangSmith)
  • LLM-as-a-judge
  • Bayesian or causal graph-based approaches to synthetic data generation
  • confidence calibration techniques
  • uncertainty quantification
  • ML monitoring or observability platforms (e.g., MLflow, Weights & Biases, or equivalent)
  • privacy-constrained data
  • regulatory compliance frameworks (GDPR, DMA)
  • financial services, fintech, or consumer payment products

What the JD emphasized

  • model quality starts long before training — it starts with the data
  • silent degradation and data drift
  • ML observability
  • GenAI workflows
  • data quality metrics

Other signals

  • ML model quality
  • Generative AI technologies
  • data quality for AI systems
  • validation frameworks
  • monitoring pipelines
  • training and validation datasets
  • observability metrics
  • GenAI workflows