Staff Infrastructure Software Engineer, Enterprise AI

Scale AI Scale AI · Data AI · San Francisco, CA · Enterprise Engineering

Staff Infrastructure Software Engineer focused on building and scaling the 'paved road' for knowledge retrieval and inference engines, defining deployment standards for Agentic workflows, and architecting multi-cloud systems for enterprise AI in regulated industries.

What you'd actually do

  1. Architect multi-cloud systems and abstractions to allow the SGP platform to run on top of existing Cloud providers.
  2. Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers.
  3. Design and champion highly scalable, reliable, and low-latency infrastructure and frameworks for building, orchestrating, and evaluating multi-agent systems at enterprise scale.
  4. Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response.
  5. Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization to improve workflows and operational efficiency.

Skills

Required

  • modern infrastructure practices
  • CI/CD
  • IaC (e.g., Terraform, Helm Charts)
  • container orchestration (e.g., Kubernetes)
  • observability platforms (e.g., Datadog, Prometheus, Grafana)
  • major cloud provider (AWS, Azure, or GCP)
  • security and compliance in enterprise environments
  • access management
  • data isolation
  • customer-specific VPC setups
  • Python or JavaScript/TypeScript
  • SQL

Nice to have

  • Agents
  • LLMs
  • vector databases
  • emerging AI technologies

What the JD emphasized

  • highly-regulated industries
  • secure, reliable, and scalable Agentic workflows
  • highly scalable, reliable, and low-latency infrastructure
  • compliance, privacy, and security standards

Other signals

  • architecting and implementing solutions for enterprise AI orchestration
  • defining deployment standards for Agentic workflows at scale
  • building, orchestrating, and evaluating multi-agent systems