Staff Software Engineer, Core Infrastructure

Harvey Harvey · AI Frontier · San Francisco, CA · Engineering

Staff Software Engineer on the Core Infrastructure team responsible for designing, building, scaling, and strengthening infrastructure systems that power Harvey's AI platform. This includes managing multi-cloud infrastructure, Kubernetes, observability, incident response, and distributed systems optimization to ensure reliability, scalability, and security for enterprise customers.

What you'd actually do

  1. Design and build scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions
  2. Own and evolve our multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management
  3. Lead technical initiatives around observability, incident response, and operational excellence — building systems that enable rapid detection and resolution of issues
  4. Architect and optimize our distributed systems for reliability, including load balancing, quota management, and failover mechanisms
  5. Partner with Product Engineering and Security teams to ensure our infrastructure is an accelerant, not a constraint

Skills

Required

  • 10+ years of experience in Infrastructure Engineering or Platform Engineering
  • building and scaling complex, large-scale distributed systems
  • cloud infrastructure platforms (Azure preferred; GCP or AWS experience transfers well)
  • Infrastructure as Code (IaC) tools — Terraform, Pulumi, or CloudFormation
  • Kubernetes, container orchestration, networking, and cloud security at scale
  • observability tools (Datadog, Sentry) and incident response practices (PagerDuty, Incident.io)
  • Python, Go, or similar languages
  • problem-solving skills
  • operational excellence

Nice to have

  • Experience building infrastructure for AI/ML workloads or high-throughput inference systems
  • distributed rate limiting, load balancing, or quota management systems
  • operating multi-tenant platforms with strict security and compliance requirements
  • leading complex cross-functional projects and delivering measurable impact

What the JD emphasized

  • 10+ years of experience in Infrastructure Engineering or Platform Engineering in a production environment
  • Long track record building and scaling complex, large-scale distributed systems