Language Engineer, Artificial General Intelligence - Data Services

Amazon Amazon · Big Tech · Boston, MA · Data Science

The Language Engineer will focus on dataset construction, linguistic annotation, dialog/semantic schemas, and automatic processing of large datasets to advance natural language processing and machine learning. Responsibilities include designing data collection tasks, analyzing data for insights, building data analysis tools, and collaborating with scientists to evaluate language models.

What you'd actually do

  1. Design data collection/creation tasks in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
  2. Analyze and extract language-related insights from large amounts of data
  3. Build tools or tool prototypes for data analysis or data authoring, using Python or another scripting language
  4. Use modeling tools to bootstrap or test new functionalities
  5. Collaborate with scientists and software engineers to evaluate performance of language models
  6. Handle competing requests from a range of data customers

Skills

Required

  • Master’s or higher degree in a relevant field (computational linguistics or equivalent field with computational analysis)
  • 2+ years experience in computational linguistics or language data processing
  • Experience with language annotation and other forms of data markup
  • Experience with scripting languages, such as Python
  • Experience working with speech and text language data in multiple languages
  • Excellent communication, strong organizational skills and very detailed oriented
  • Comfortable working in a fast paced, highly collaborative, dynamic work environment

Nice to have

  • PhD in Computational Linguistics (or equivalent field with computational emphasis)
  • Expertise in bootstrapping language data collections in a quickly changing environment
  • Comfortable working with speech and text language data in multiple languages
  • Practical familiarity with Machine Learning and language modeling
  • Practical knowledge of version control and agile development
  • Familiarity with database queries and data analysis processes (SQL, R, Matlab, etc.)

What the JD emphasized

  • computational linguistics
  • language data processing
  • language annotation
  • scripting languages, such as Python
  • speech and text language data in multiple languages

Other signals

  • dataset construction
  • linguistic annotation
  • dialog/semantic schemas
  • automatic processing of large datasets
  • natural language processing
  • machine learning
  • AI systems aligned with human policies and preferences