Language Engineer, Artificial General I… at Amazon

What you'd actually do

Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables

Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches

Analyze and extract insights from large amounts of data

Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language

Use modeling tools to bootstrap or test new AI functionalities

Skills

Required

Experience owning and executing language data collection projects, including guidelines, labelset and annotation workflow development
Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
2+ years experience in computational linguistics or language data processing or AI data creation
Experience with language data annotation systems and other forms of data markup
Proficient with scripting languages, such as Python
Experience working with speech, text, and multimodal data in multiple languages
Excellent communication, strong organizational skills and very detailed oriented
Comfortable working in a fast paced, highly collaborative, dynamic work environment

Nice to have

PhD in Computational Linguistics (or equivalent field with computational emphasis)
Expertise in bootstrapping AI data collections for quickly evolving requirements
Extensive experience working with speech, text, and multimodal data in multiple languages
Experience in data creation for complex agentic workflows
Practical experience with Machine Learning and technical concepts such as API
Practical knowledge of version control and agile development; familiarity with database queries and data analysis processes (SQL, R, Matlab, etc.)

The Amazon Artificial General Intelligence (AGI) Data Services organization is responsible for developing diverse datasets to train and evaluate the Amazon AI models. We are looking for Language Engineers to join our science and engineering team to support the development of complex, multimodal datasets, using a range of approaches including synthetic data generation, model-supported data generation, and human-in-the-loop data collections.

You will play a critical role in driving innovation and advancing the state-of-the-art in evaluating and training AI models. You will work closely with cross-functional teams, including product managers, engineers, and data scientists to ensure that our AI systems are best in class.

Key job responsibilities

Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
Analyze and extract insights from large amounts of data
Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
Use modeling tools to bootstrap or test new AI functionalities
Collaborate with scientists, software engineers, and other data creators to evaluate performance of AI models

About the team Amazon strives to be the world’s most customer-centric company, where customers can research and purchase anything they might want online or offline. We set big goals and are looking for people who can help us reach and exceed them. The AGI organization provides AI capabilities for a variety of Amazon products and searches. We provide secure, flexible, cost effective, and high-quality data development services to our customers, that enables them to build advanced ML models.

Basic Qualifications

Experience owning and executing language data collection projects, including guidelines, labelset and annotation workflow development
Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
2+ years experience in computational linguistics or language data processing or AI data creation
Experience with language data annotation systems and other forms of data markup
Proficient with scripting languages, such as Python
Experience working with speech, text, and multimodal data in multiple languages
Excellent communication, strong organizational skills and very detailed oriented
Comfortable working in a fast paced, highly collaborative, dynamic work environment

Preferred Qualifications

PhD in Computational Linguistics (or equivalent field with computational emphasis)
Expertise in bootstrapping AI data collections for quickly evolving requirements
Extensive experience working with speech, text, and multimodal data in multiple languages
Experience in data creation for complex agentic workflows
Practical experience with Machine Learning and technical concepts such as API
Practical knowledge of version control and agile development; familiarity with database queries and data analysis processes (SQL, R, Matlab, etc.)

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Sunnyvale - 86,500.00 - 151,400.00 USD annually USA, MA, Boston - 75,200.00 - 131,600.00 USD annually USA, WA, BELLEVUE - 82,700.00 - 131,600.00 USD annually

Key job responsibilities

Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
Analyze and extract insights from large amounts of data
Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
Use modeling tools to bootstrap or test new AI functionalities
Collaborate with scientists, software engineers, and other data creators to evaluate performance of AI models

Basic Qualifications

Experience owning and executing language data collection projects, including guidelines, labelset and annotation workflow development
Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
2+ years experience in computational linguistics or language data processing or AI data creation
Experience with language data annotation systems and other forms of data markup
Proficient with scripting languages, such as Python
Experience working with speech, text, and multimodal data in multiple languages
Excellent communication, strong organizational skills and very detailed oriented
Comfortable working in a fast paced, highly collaborative, dynamic work environment

Preferred Qualifications

PhD in Computational Linguistics (or equivalent field with computational emphasis)
Expertise in bootstrapping AI data collections for quickly evolving requirements
Extensive experience working with speech, text, and multimodal data in multiple languages
Experience in data creation for complex agentic workflows
Practical experience with Machine Learning and technical concepts such as API
Practical knowledge of version control and agile development; familiarity with database queries and data analysis processes (SQL, R, Matlab, etc.)

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

USA, CA, Sunnyvale - 86,500.00 - 151,400.00 USD annually USA, MA, Boston - 75,200.00 - 131,600.00 USD annually USA, WA, BELLEVUE - 82,700.00 - 131,600.00 USD annually

Language Engineer, Artificial General Intelligence - Data Services

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications