Software Engineer

Harvey Harvey · AI Frontier · San Francisco, CA · Engineering

Software Engineer role focused on designing, developing, and deploying infrastructure services and automation tools to support platform growth and new product initiatives for an enterprise AI company. Responsibilities include managing and optimizing infrastructure, leading incident management, evaluating infrastructure decisions, and collaborating on reliability, security, and compliance.

What you'd actually do

  1. Design, develop, and deploy new infrastructure services and automation tools to support platform growth and new product initiatives.
  2. Manage and optimize existing infrastructure components (compute, storage, networking) across 50+ global regions.
  3. Lead and participate in incident management, conducting postmortems, root cause analyses, and implementing long-term improvements.
  4. Evaluate infrastructure decisions and capacity planning strategies to improve reliability, scalability, and performance.
  5. Collaborate across teams to drive reliability, security, and compliance throughout the software lifecycle.

Skills

Required

  • Bachelor’s degree or foreign degree equivalent in Computer Science, or related field and four (4) years of experience in Software Engineering related role or job offered.
  • Designing and operating complex, large-scale distributed systems in production, including service discovery, load balancing, high availability, and disaster recovery across multi-region or multi-availability-zone deployments.
  • Implementing Infrastructure as Code (IaC) using tools such as Terraform or Pulumi, including authoring reusable modules, performing code reviews, and executing change management with drift detection and automated policy checks.
  • Administering Kubernetes in production, including cluster provisioning and upgrades, workload orchestration and autoscaling, Helm-based packaging, and network policy configuration.
  • Building internal automation and platform tooling using script programming languages such as Python, Bash or Rush, including developing command-line tools or services that interact with cloud and Kubernetes APIs and implementing automated tests.
  • Configuring and operating observability stacks, including metrics, logs, and distributed tracing (e.g., Datadog, OpenTelemetry, Sentry), defining SLIs/SLOs, and creating actionable alerts integrated with incident response tooling (e.g., PagerDuty or [Incident.io]).
  • Designing and maintaining CI/CD pipelines (e.g., GitHub Actions or BuildKite), including build, test, and deployment automation, artifact management, and progressive delivery strategies (blue/green or canary).
  • Engineering cloud infrastructure on at least one major cloud platform (AWS, GCP, or Azure), including compute, networking (VPC/VNet design, routing, load balancing, and peering), identity and access management, and object/block storage.
  • Managing operational data stores and caches (e.g., PostgreSQL or MySQL; Redis; and a document or key-value store such as MongoDB or DynamoDB), including replication/backup configuration, schema or data modeling, and performance tuning.
  • Implementing network and platform security controls, including secrets management (e.g., CKMS, EKMS, CMEK), OS hardening and patching, least-privilege IAM policy design, and vulnerability remediation workflows with CI/CD gates.

What the JD emphasized

  • Designing and operating complex, large-scale distributed systems in production
  • Implementing Infrastructure as Code (IaC)
  • Administering Kubernetes in production
  • Building internal automation and platform tooling
  • Configuring and operating observability stacks
  • Designing and maintaining CI/CD pipelines
  • Engineering cloud infrastructure on at least one major cloud platform
  • Managing operational data stores and caches
  • Implementing network and platform security controls