AI Language Engineer II , Artificial General Intelligence - Data Services

Amazon Amazon · Big Tech · London, United Kingdom · Data Science

The AI Language Engineer II role focuses on developing diverse datasets for training and evaluating Amazon's AI models. This involves designing and executing complex data collection projects using synthetic data generation, model-supported methods, and human-in-the-loop approaches. The role requires analyzing data, building tools for data analysis and creation, and collaborating with cross-functional teams to evaluate AI model performance.

What you'd actually do

  1. Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
  2. Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
  3. Analyze and extract insights from large amounts of data
  4. Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
  5. Use modeling tools to bootstrap or test new AI functionalities

Skills

Required

  • Master's degree or above in Computational Linguistics
  • Master's degree or above in Linguistics or a related field
  • Experience in computational linguistics, language data processing, semantics, and philosophy of language
  • Experience in Python, Perl, or another scripting language
  • Experience in speech and language data analysis

Nice to have

  • Experience owning and executing language data collection projects, including guidelines, labelset and annotation workflow development

What the JD emphasized

  • complex data collections
  • synthetic and model-based data generation methods
  • state-of-the-art approaches

Other signals

  • developing diverse datasets to train and evaluate Amazon AI models
  • synthetic data generation
  • model-supported data generation
  • human-in-the-loop data collections