Senior Applied Scientist, Amazon Aws Agentic Ai, Aws AI Fundamental Research

Amazon Amazon · Big Tech · Santa Clara, CA · Data Science

This role focuses on leading the design and development of agentic evaluation frameworks and training evaluation/critic models to assess the quality and effectiveness of AI agents. The scientist will define methodologies, create benchmarks, build automated systems, and conduct research to advance agent and evaluation science. The role involves end-to-end ownership from research to production deployment, collaborating with engineering to deliver these capabilities as managed AWS services. It also includes mentoring junior scientists and contributing to the research community.

What you'd actually do

  1. lead the design and development of agentic evaluation frameworks and evaluation/critic model training that assess the quality and effectiveness of AI agents at scale.
  2. define evaluation methodologies, create benchmarks, and build evaluation models and automated systems that measure agent performance across critical dimensions.
  3. stay at the forefront of the rapidly evolving field by studying and adopting state-of-the-art methods, conducting original research to advance the science of agent and evaluation.
  4. own the end-to-end lifecycle from research and data curation through model training to production deployment, working closely with engineering to deliver evaluation capabilities as managed AWS services.
  5. collaborate with cross-functional stakeholders to translate science insights into actionable improvements, mentor junior scientists, and contribute to the broader research community.

Skills

Required

  • building machine learning models for business application experience
  • PhD, or Master's degree and 6+ years of applied research experience
  • Experience programming in Java, C++, Python or related language
  • Experience with neural deep learning methods and machine learning

Nice to have

  • modeling tools such as R, scikit-learn, Spark MLLib, MxNet, Tensorflow, numpy, scipy etc.
  • large scale distributed systems such as Hadoop, Spark etc.

What the JD emphasized

  • agentic evaluation frameworks
  • evaluation/critic model training
  • agent performance
  • science of agent and evaluation
  • managed AWS services

Other signals

  • leading the design and development of agentic evaluation frameworks
  • training evaluation/critic models
  • defining evaluation methodologies and creating benchmarks
  • building evaluation models and automated systems
  • researching and building innovative solutions using Agentic AI
  • delivering evaluation capabilities as managed AWS services