Language Engineer, Artificial General Intelligence - Data Services

Amazon Amazon · Big Tech · Boston, MA · Data Science

Language Engineer role focused on dataset construction, linguistic annotation, dialog/semantic schemas, and automatic processing of large datasets for AGI Data Services. The role involves designing data collection tasks, analyzing data for insights, building data authoring tools, and collaborating with scientists to evaluate language models.

What you'd actually do

  1. Design data collection/creation tasks in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
  2. Analyze and extract language-related insights from large amounts of data
  3. Build tools or tool prototypes for data analysis or data authoring, using Python or another scripting language
  4. Use modeling tools to bootstrap or test new functionalities
  5. Collaborate with scientists and software engineers to evaluate performance of language models
  6. Handle competing requests from a range of data customers

Skills

Required

  • computational linguistics
  • language data processing
  • language annotation
  • data markup
  • Python
  • speech and text language data
  • multiple languages

Nice to have

  • PhD in Computational Linguistics
  • bootstrapping language data collections
  • writing grammars
  • building FSTs
  • statistical language modeling
  • version control
  • agile development
  • database queries
  • data analysis processes
  • SQL
  • R
  • Matlab

What the JD emphasized

  • Masters’s or higher degree in a relevant field (computational linguistics or equivalent field with computational analysis)
  • 2+ years experience in computational linguistics or language data processing
  • Experience with language annotation and other forms of data markup
  • Experience with scripting languages, such as Python
  • Experience working with speech and text language data in multiple languages

Other signals

  • dataset construction
  • linguistic annotation
  • dialog/semantic schemas
  • automatic processing of large datasets
  • natural language processing
  • machine learning
  • AI systems aligned with human policies and preferences