Agent Evaluation Engineer

Comcast Comcast · Media · Washington, DC

This role focuses on building and managing evaluation pipelines, metrics, and automated systems to test the behavior, accuracy, and reliability of AI agents before release. It involves defining benchmarks, curating datasets, integrating evaluation into CI/CD, and monitoring agents in production.

What you'd actually do

  1. Design and develop agent evaluation pipelines across development, staging, and production environments
  2. Define and standardize evaluation metrics and benchmarks for conversational AI quality (accuracy, relevance, CX, safety)
  3. Build automated and human-in-the-loop evaluation systems to assess agent performance
  4. Manage and curate evaluation datasets, test sets, and annotation workflows
  5. Enable continuous evaluation and monitoring of agents in production

Skills

Required

  • AI Agents
  • Benchmarking
  • CI/CD
  • Evaluation Metrics
  • Large Language Models (LLMs)
  • Machine Learning (ML)

Nice to have

  • customer support AI or chatbot platforms
  • responsible AI (bias, fairness, hallucination mitigation)

What the JD emphasized

  • AI agents
  • evaluation
  • agent evaluation
  • conversational AI quality

Other signals

  • AI agents
  • evaluation pipelines
  • metrics and benchmarks
  • automated and human-in-the-loop evaluation systems
  • continuous evaluation and monitoring