Product Manager, Agent Harness

Cursor Cursor · Coding AI · San Francisco, CA · Product Management

Product Manager for Cursor's Agent Harness, responsible for the framework that enables AI agents to decompose tasks, interact with the file system/terminal, handle failures, and be observed/steered by developers. The role involves turning research advances into product, analyzing agent traces, designing evaluation frameworks, and defining agent extensibility primitives.

What you'd actually do

  1. Owning the agent planning and execution framework: how agents decompose tasks, decide what tools to use, and recover when a step fails. Balancing autonomy with predictability.
  2. Designing how developers observe and steer agents: real-time progress, guardrails, the ability to redirect mid-task. The experience should build trust without requiring micromanagement.
  3. Building evaluation and benchmarking systems: defining what "good" means for agent quality—task completion rate, error recovery, hallucination frequency—and building the harnesses to measure it. These measurements drive engineering and research priorities.
  4. Analyzing agent traces at scale: identifying where agents get stuck, loop, hallucinate, or take unproductive paths, and turning those patterns into concrete improvements.
  5. Defining the primitives for agent extensibility: how agents use tools, access codebase context, call external services via MCPs and plugins on the Cursor Marketplace, and how developers customize agent behavior through rules and constraints.

Skills

Required

  • Product Management
  • AI Agents
  • LLM Applications
  • Developer Tools
  • Technical Depth
  • Code Analysis
  • System Behavior Reasoning
  • Evaluation and Measurement
  • Metric Definition
  • Research-Adjacent Environments
  • Reinforcement Learning
  • Agent Frameworks
  • AI Evaluation

Nice to have

  • Agent Harness Design
  • Task Decomposition
  • Failure Handling
  • Observability
  • Steering Agents
  • Benchmarking Systems
  • Agent Trace Analysis
  • Agent Extensibility Primitives
  • Tool Use
  • Codebase Context Access
  • MCPs
  • Plugins
  • Cursor Marketplace
  • Customization Rules
  • Constraints
  • Multi-agent Coordination

What the JD emphasized

  • built or evaluated AI agents
  • AI agents
  • LLM applications
  • ML-powered developer tools
  • deeply technical
  • comfortable reading code
  • analyzing traces
  • reasoning about system behavior
  • strong intuition for evaluation and measurement
  • define metrics that capture quality
  • comfortable in a research-adjacent environment
  • experience with reinforcement learning
  • agent frameworks
  • AI evaluation
  • practitioner
  • working closely with researchers

Other signals

  • AI agents
  • LLM applications
  • developer tools
  • agent harness
  • evaluation frameworks
  • multi-agent coordination