Senior Software Engineer, Infrastructure

Decagon Decagon · Vertical AI · San Francisco, CA · Engineering

Senior Infrastructure Engineer responsible for designing, building, and operating production infrastructure for high-scale, low-latency AI systems, specifically focusing on ML serving platforms for LLM inference. The role involves optimizing performance, ensuring high availability, and supporting various deployment architectures, including on-prem and air-gapped environments.

What you'd actually do

  1. Design and implement critical infrastructure services with strong SLOs, clear runbooks, and actionable telemetry.
  2. Partner with research and product teams to architect solutions, set up prototypes, evaluate performance, and scale new features.
  3. Tune service latencies: optimize networking paths, apply smart caching/queuing, and tune CPU/memory/I/O for tight p95/p99s.
  4. Evolve CI/CD, golden paths, and self‑service tooling to improve developer velocity and safety.
  5. Support various deployment architectures for customers with robust observability and upgrade paths.

Skills

Required

  • 5+ years building and operating production infrastructure at scale
  • Depth in at least one area across Core/Data/AI-ML/Platform/Voice
  • Proven track record meeting high availability and low latency targets (owning SLOs, p95/p99, and load testing)
  • Excellent observability chops (OpenTelemetry, Prometheus/Grafana, Datadog) and incident response (PagerDuty, SLO/error budgets)
  • Clear written communication and the ability to turn ambiguous requirements into simple, reliable designs
  • infrastructure-as-code (Terraform)
  • GitOps practices

Nice to have

  • Experience being an early backend/platform/infrastructure engineer at another company
  • Strong Kubernetes experience (GKE/EKS/AKS)
  • Experience across multiple cloud providers (GCP, AWS, and Azure)
  • Experience with customer‑managed deployments

What the JD emphasized

  • high availability
  • low latency

Other signals

  • ML Infra: GPU and model-serving platforms for LLM inference with multi-provider routing and support for on-prem/air-gapped deployments.
  • Tune service latencies: optimize networking paths, apply smart caching/queuing, and tune CPU/memory/I/O for tight p95/p99s.
  • Partner with research and product teams to architect solutions, set up prototypes, evaluate performance, and scale new features.