Applied Scientist, Observability and Triage, Prime Video

Amazon Amazon · Big Tech · London, United Kingdom · Applied Science

Applied Scientist role focused on building generative AI and large model systems for automated incident triage, root cause analysis, and resolution recommendation within Prime Video's observability and operational systems. The role involves prototyping, evaluating hypotheses, building evaluation frameworks, and collaborating with engineering teams to integrate ML models into production.

What you'd actually do

  1. Design and develop machine learning and generative AI systems for automated incident triage, root cause analysis, and resolution recommendation at scale
  2. Rapidly prototype and evaluate hypotheses in a high-ambiguity environment, leveraging both quantitative experimentation and domain expertise in operational systems
  3. Build evaluation frameworks (including LLM-as-a-Judge approaches) to measure model accuracy across triage accuracy and root cause prediction
  4. Collaborate with software engineering teams to integrate ML models into production observability systems serving hundreds of development teams
  5. Communicate results and insights to both technical and non-technical audiences, including through publications, presentations, and written reports

Skills

Required

  • Experience programming in Java, C++, Python or related language
  • Experience in building machine learning models for business application
  • PhD, or Master's degree in CS, CE, ML or equivalent relevant work experience

Nice to have

  • Experience in any of the following areas: algorithms and data structures, parsing, numerical optimization, data mining, parallel and distributed computing, high-performance computing
  • Experience using Unix/Linux
  • Experience in professional software development

What the JD emphasized

  • generative AI
  • large models
  • automated incident triage
  • root cause analysis
  • resolution recommendation
  • LLM-as-a-Judge

Other signals

  • generative AI
  • large models
  • automated incident triage
  • root cause analysis
  • resolution recommendation
  • LLM-as-a-Judge