What you'd actually do

Design and develop machine learning and generative AI systems for automated incident triage, root cause analysis, and resolution recommendation at scale

Rapidly prototype and evaluate hypotheses in a high-ambiguity environment, leveraging both quantitative experimentation and domain expertise in operational systems

Build evaluation frameworks (including LLM-as-a-Judge approaches) to measure model accuracy across triage accuracy and root cause prediction

Collaborate with software engineering teams to integrate ML models into production observability systems serving hundreds of development teams

Communicate results and insights to both technical and non-technical audiences, including through publications, presentations, and written reports

Skills

Required

Experience programming in Java, C++, Python or related language
Experience in building machine learning models for business application
PhD, or Master's degree in CS, CE, ML or equivalent relevant work experience

Nice to have

Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
Experience using Unix/Linux
Experience in professional software development

Come build the future of entertainment with us. Are you interested in shaping the future of movies and television? Do you want to define the next generation of how and what Amazon customers are watching?

Prime Video is a premium streaming service that offers customers a vast collection of TV shows and movies - all with the ease of finding what they love to watch in one place. We offer customers thousands of popular movies and TV shows including Amazon Originals and exclusive licensed content to exciting live sports events. We also offer our members the opportunity to subscribe to add-on channels which they can cancel at anytime and to rent or buy new release movies and TV box sets on the Prime Video Store. Prime Video is a fast-paced, growth business - available in over 200 countries and territories worldwide. The team works in a dynamic environment where innovating on behalf of our customers is at the heart of everything we do. If this sounds exciting to you, please read on.

The Observability and Triage team is looking for an Applied Scientist for our London office experienced in generative AI and large models. This is a wide impact role working with development teams across the UK, India, and the US. This greenfield project will deliver features that reduce the operational load for internal Prime Video builders and for this, you will develop AI-driven solutions that automatically detect anomalies, identify root causes, recommend resolution paths and take action for operational incidents. We consume petabytes of data daily across multiple metric, log and data based events and you would be experimenting on how to shape the future through this data.

You will have strong technical ability, excellent teamwork and communication skills, and a strong motivation to deliver customer value from your research. Our position offers opportunities to grow your technical and non-technical skills and make a global impact.

Key job responsibilities

Design and develop machine learning and generative AI systems for automated incident triage, root cause analysis, and resolution recommendation at scale
Rapidly prototype and evaluate hypotheses in a high-ambiguity environment, leveraging both quantitative experimentation and domain expertise in operational systems
Build evaluation frameworks (including LLM-as-a-Judge approaches) to measure model accuracy across triage accuracy and root cause prediction
Collaborate with software engineering teams to integrate ML models into production observability systems serving hundreds of development teams
Communicate results and insights to both technical and non-technical audiences, including through publications, presentations, and written reports

A day in the life On a typical day, you analyse patterns across thousands of operational incidents to improve an automated triage model, then design an experiment to test whether a new Generative-AI based approach better identifies root causes for complex multi-service incidents. Your internal customers are Prime Video development teams who rely on your solutions to reduce the time and effort spent responding to operational events. You will collaborate closely with software engineers, and operational stakeholders across the world to ensure your research translates into production systems that measurably remove customer impact.

About the team Our team builds AI-powered observability and triage solutions for Prime Video development teams, consuming petabytes of data daily to automatically detect, diagnose, and recommend resolutions for operational incidents.

Basic Qualifications

Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
Experience in building machine learning models for business application
PhD, or Master's degree in CS, CE, ML or equivalent relevant work experience

Preferred Qualifications

Experience using Unix/Linux
Experience in professional software development

Amazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decisions based on your experience and skills. We value your passion to discover, invent, simplify and build. Protecting your privacy and the security of your data is a longstanding top priority for Amazon. Please consult our Privacy Notice (https://www.amazon.jobs/en/privacy_page) to know more about how we collect, use and transfer the personal data of our candidates.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Key job responsibilities

Design and develop machine learning and generative AI systems for automated incident triage, root cause analysis, and resolution recommendation at scale
Rapidly prototype and evaluate hypotheses in a high-ambiguity environment, leveraging both quantitative experimentation and domain expertise in operational systems
Build evaluation frameworks (including LLM-as-a-Judge approaches) to measure model accuracy across triage accuracy and root cause prediction
Collaborate with software engineering teams to integrate ML models into production observability systems serving hundreds of development teams
Communicate results and insights to both technical and non-technical audiences, including through publications, presentations, and written reports

Basic Qualifications

Experience programming in Java, C++, Python or related language
Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
Experience in building machine learning models for business application
PhD, or Master's degree in CS, CE, ML or equivalent relevant work experience

Preferred Qualifications

Experience using Unix/Linux
Experience in professional software development

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Applied Scientist, Observability and Triage, Prime Video

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications