Staff Software Engineer, Agents

Harvey Harvey · AI Frontier · New York, NY · Engineering

This role focuses on building and optimizing agentic AI systems for legal professionals. Responsibilities include designing agent environments and actions, managing context windows, creating tools, developing evaluations for faster iteration, and optimizing agent performance through prompt engineering, model selection, and tool design. The role also involves working with infrastructure teams for low-latency execution and improving observability.

What you'd actually do

  1. Partner with customers and PMs to understand legal workflows, design practical evaluations that capture what “excellent” means, and ship agents that get the job done.
  2. Optimize agent performance through prompt engineering, model selection, tool design, skill writing, context window management, and eval harness development.
  3. Work with our model infra team to design and implement infrastructure for low-latency agent execution, including caching strategies, parallel tool calls, or subagent patterns.
  4. Improve our observability and instrumentation to profile agent behavior, identify bottlenecks, and drive optimization decisions.
  5. Stay current on new developments in agentic systems and bring those learnings back to the products we build.

Skills

Required

  • Python
  • LLM APIs
  • agent frameworks
  • shipping user-facing products
  • prompt engineering
  • model selection
  • tool design
  • eval harness development
  • low-latency agent execution
  • observability

Nice to have

  • legal workflows
  • context window management
  • caching strategies
  • parallel tool calls
  • subagent patterns

What the JD emphasized

  • build the systems that make our AI agents indispensable
  • design environments and actions for agentic professional work
  • make model selection decisions
  • create optimal tools
  • develop evals that enable faster iteration loops to unlock new capabilities
  • immersed in the space
  • driven to ship impactful products
  • experienced in using practical evaluations to drive task completion quality and customer delight
  • automating information requests and diligence checks across hundreds of thousands of files with retrieval and file editing agents
  • improving the latency and quality of agents on applying a standard legal "playbook" to contracts
  • optimizing our multi-source retrieval agents
  • tuning the harness and libraries for coding agents
  • design practical evaluations that capture what “excellent” means
  • ship agents that get the job done
  • Optimize agent performance
  • prompt engineering
  • model selection
  • tool design
  • skill writing
  • context window management
  • eval harness development
  • design and implement infrastructure for low-latency agent execution
  • caching strategies
  • parallel tool calls
  • subagent patterns
  • Improve our observability and instrumentation
  • profile agent behavior
  • identify bottlenecks
  • drive optimization decisions
  • Stay current on new developments in agentic systems
  • bring those learnings back to the products we build
  • Passion for building effective domain-specific agents
  • Iterative mindset
  • develop proof of concepts
  • make decisions quickly
  • ship v0s
  • Comfortable with when and how to use evaluations to drive quality
  • Humble and adaptable about code and frameworks
  • drive adoption of new best practices
  • Proficiency in Python
  • experience working with LLM APIs and agent frameworks
  • Experience with shipping user-facing products

Other signals

  • building agentic AI
  • enterprise-grade platform
  • scaling fast
  • defining a new category
  • shipping impactful products
  • practical evaluations to drive task completion quality and customer delight