Language Engineer, Artificial General Intelligence - Data Services

Amazon Amazon · Big Tech · Boston, MA · Data Science

The Language Engineer will focus on dataset construction, linguistic annotation, dialog/semantic schemas, and automatic processing of large datasets to advance NLP and ML. Responsibilities include designing data collection tasks, analyzing data for insights, building data analysis tools, and collaborating with scientists to evaluate language models.

What you'd actually do

  1. Design data collection/creation tasks in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
  2. Analyze and extract language-related insights from large amounts of data
  3. Build tools or tool prototypes for data analysis or data authoring, using Python or another scripting language
  4. Use modeling tools to bootstrap or test new functionalities
  5. Collaborate with scientists and software engineers to evaluate performance of language models
  6. Handle competing requests from a range of data customers

Skills

Required

  • language annotation and other forms of data markup
  • scripting language (e.g., Python, KornShell)
  • speech and text language data in multiple languages
  • fast-paced, team environment
  • Masters’s or higher degree in a relevant field (computational linguistics or equivalent field with computational analysis)
  • 2+ years experience in computational linguistics or language data processing
  • Excellent communication, strong organizational skills and very detailed oriented

Nice to have

  • writing grammars and building FSTs
  • statistical language modeling
  • PhD in Computational Linguistics (or equivalent field with computational emphasis)
  • bootstrapping language data collections in a quickly changing environment
  • version control and agile development
  • database queries and data analysis processes (SQL, R, Matlab, etc.)
  • Willingness to support several projects at one time, and to accept reprioritization as necessary
  • Able to think creatively and possess strong analytical and problem solving skills

What the JD emphasized

  • Masters’s or higher degree in a relevant field (computational linguistics or equivalent field with computational analysis)
  • 2+ years experience in computational linguistics or language data processing

Other signals

  • dataset construction
  • linguistic annotation
  • dialog/semantic schemas
  • automatic processing of large datasets
  • natural language processing
  • machine learning
  • AI systems aligned with human policies and preferences