Senior Software Engineer, Core Infrastructure

Harvey Harvey · AI Frontier · San Francisco, CA · Engineering

Software Engineer on the Core Infrastructure team responsible for designing, building, and scaling the underlying systems that power Harvey's global legal AI platform. This includes managing multi-cloud infrastructure, Kubernetes, networking, observability, and distributed systems for reliability and efficiency, with a focus on supporting high-throughput inference requests and enterprise-grade requirements.

What you'd actually do

  1. Design and build scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions
  2. Own and evolve our multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management
  3. Lead technical initiatives around observability, incident response, and operational excellence — building systems that enable rapid detection and resolution of issues
  4. Architect and optimize our distributed systems for reliability, including load balancing, quota management, and failover mechanisms
  5. Partner with Product Engineering and Security teams to ensure our infrastructure is an accelerant, not a constraint

Skills

Required

  • 4+ years of experience in Infrastructure Engineering or Platform Engineering in a production environment
  • Long track record building and scaling complex, large-scale distributed systems
  • Deep proficiency with cloud infrastructure platforms (Azure preferred; GCP or AWS experience transfers well)
  • Strong fluency in Infrastructure as Code (IaC) tools — Terraform, Pulumi, or CloudFormation
  • Solid understanding of Kubernetes, container orchestration, networking, and cloud security at scale
  • Experience with observability tools (Datadog, Sentry) and incident response practices (PagerDuty, [Incident.io])
  • Strong programming skills in Python, Go, or similar languages
  • Excellent problem-solving skills, a "spidey sense" of where things could go wrong, and a commitment to operational excellence

Nice to have

  • Experience building infrastructure for AI/ML workloads or high-throughput inference systems
  • Background with distributed rate limiting, load balancing, or quota management systems
  • Experience operating multi-tenant platforms with strict security and compliance requirements
  • Track record of leading complex cross-functional projects and delivering measurable impact

What the JD emphasized

  • critical role in designing and building new infrastructure systems
  • scaling and strengthening our existing infrastructure
  • foundation that powers every user interaction with Harvey — processing billions of prompt tokens and millions of daily requests
  • environment balanced between innovation — building new systems — and operational excellence
  • resilient and efficient as it scales products, regions, customers, and usage
  • reliability, scalability, and security of our platform
  • next-generation model proxy architecture that routes millions of daily inference requests
  • distributed rate limiting and quota management systems
  • multi-region deployment strategies that meet strict data residency requirements
  • comprehensive observability infrastructure
  • high-throughput inference systems