Data Domain Architect Lead - Data & Annotation

JPMorgan Chase JPMorgan Chase · Banking · Bengaluru, Karnataka, India · Consumer & Community Banking

Lead data labeling initiatives for financial industry ML models, focusing on creating reliable datasets, managing annotation operations, and driving innovation in data quality and evaluation, including LLM-as-judge and agentic workflows.

What you'd actually do

  1. Translate business requirements and ML objectives into implementable requirements, schema, guidelines and quality metrics while defining success measures and key result for each labelling effort and actively manage scope, risks, dependencies, and stakeholder communications
  2. Own the annotation operating model, including workflow design, task routing, queue management, and delivery governance
  3. Scale labeling capacity across multiple lines of business while maintaining consistency, quality, throughput and clear documentation
  4. Own data cleaning and preparation processes to resolve noise, duplicates, inconsistencies, and labeling defects
  5. Establish metrics and annotation reliability standards and a measurable quality framework, including calibration routines, gold datasets, reviews, and feedback loops

Skills

Required

  • Master's or PhD degree in Computational Linguistics, Linguistics, Computer Science, Data Science or a related field.
  • 5+ years of experience delivering data products or machine learning-enabled products across the full product lifecycle
  • Hands-on experience in developing annotation metrics, annotation and performing annotation reviews
  • Experience running text data labeling programs end-to-end, including guideline and taxonomy design and annotation platform operations
  • Hands-on experience in Python for automation, data analysis, cleaning and validating structured and unstructured datasets; plus experience using Git for version control
  • Hands-on prompt engineering experience for LLM labeling workflows (for example, pre-labeling, synthetic data generation, and instruction clarity)
  • Working knowledge of LLM-as-judge methods, including rubric design and integrating automated signals into human-in-the-loop review
  • Hands-on experience in designing labeling quality measurement (for example, gold datasets, calibration, sampling, and inter-annotator agreement targets)
  • Hands-on experience in benchmarking data quality and evaluation outcomes and translating results into product and process improvements
  • Strong stakeholder management, written and verbal communication, and disciplined execution under deadlines
  • Experience leading cross-functional delivery across technology, operations, and vendor partners

Nice to have

  • Experience managing globally distributed annotation teams and third-party vendors
  • Familiarity with metadata management, data cataloging, and dataset lineage practices
  • Experience applying machine learning to data quality monitoring and anomaly detection
  • Track record influencing senior stakeholders and aligning priorities through measurable OKRs
  • Experience working with privacy, data governance, or model risk controls related to training data

What the JD emphasized

  • Hands-on experience in Python for automation, data analysis, cleaning and validating structured and unstructured datasets
  • Hands-on prompt engineering experience for LLM labeling workflows
  • Working knowledge of LLM-as-judge methods
  • Hands-on experience in designing labeling quality measurement
  • Hands-on experience in benchmarking data quality and evaluation outcomes

Other signals

  • data annotation
  • ML models
  • LLM
  • agentic workflows
  • quality evaluation