Staff Software Engineer, Core Infrastructure

Harvey Harvey · AI Frontier · Bangalore, India · Engineering

Staff Software Engineer on the Core Infrastructure team responsible for designing, building, scaling, and strengthening infrastructure systems that power Harvey's AI platform. This includes managing multi-cloud infrastructure (Azure, GCP), Kubernetes, networking, container management, observability, incident response, and distributed systems optimization. The role also involves driving infrastructure-as-code practices and mentoring engineers. Representative projects include building a model proxy architecture, rate limiting systems, multi-region deployments, and observability infrastructure.

What you'd actually do

  1. Design and build scalable, fault-tolerant infrastructure systems that power Harvey's AI platform across multiple cloud regions
  2. Own and evolve our multi-cloud infrastructure (Azure, GCP), including Kubernetes orchestration, networking, and container management
  3. Lead technical initiatives around observability, incident response, and operational excellence — building systems that enable rapid detection and resolution of issues
  4. Architect and optimize our distributed systems for reliability, including load balancing, quota management, and failover mechanisms
  5. Partner with Product Engineering and Security teams to ensure our infrastructure is an accelerant, not a constraint

Skills

Required

  • Infrastructure Engineering
  • Platform Engineering
  • Cloud infrastructure platforms (Azure, GCP, AWS)
  • Kubernetes
  • Container orchestration
  • Networking
  • Cloud security
  • Infrastructure as Code (IaC)
  • Terraform
  • Pulumi
  • CloudFormation
  • Observability tools (Datadog, Sentry)
  • Incident response practices (PagerDuty, Incident.io)
  • Python
  • Go

Nice to have

  • Experience building infrastructure for AI/ML workloads or high-throughput inference systems
  • Distributed rate limiting
  • Load balancing
  • Quota management systems
  • Operating multi-tenant platforms with strict security and compliance requirements
  • Leading complex cross-functional projects

What the JD emphasized

  • 8+ years of experience in Infrastructure Engineering or Platform Engineering in a production environment
  • Long track record building and scaling complex, large-scale distributed systems
  • Deep proficiency with cloud infrastructure platforms (Azure preferred; GCP or AWS experience transfers well)
  • Strong fluency in Infrastructure as Code (IaC) tools — Terraform, Pulumi, or CloudFormation
  • Solid understanding of Kubernetes, container orchestration, networking, and cloud security at scale
  • Experience with observability tools (Datadog, Sentry) and incident response practices (PagerDuty, [Incident.io])
  • Strong programming skills in Python, Go, or similar languages
  • Excellent problem-solving skills, a "spidey sense" of where things could go wrong, and a commitment to operational excellence