Infrastructure Engineer/sre

Cresta Cresta · Vertical AI · AB, Canada, Canada · Remote · Engineering

Infrastructure Engineer/SRE role focused on building and advancing core infrastructure for an AI company, including developer toolchains, Kubernetes clusters, logging, and infrastructure-as-code. The role specifically involves building machine learning infrastructure to support AI teams in training, testing, and deploying models on large-scale datasets.

What you'd actually do

  1. Developer Toolchain. Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
  2. Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
  3. Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.
  4. Infrastructure-as-code deployment tooling and supporting services on multiple cloud providers.
  5. Automate operations and engineering. Focus on automation so we can spend energy where it matters.
  6. Building machine learning infrastructure that enables AI teams to train, test, and deploy on large-scale datasets.

Skills

Required

  • Golang or Python
  • Kubernetes
  • AWS
  • Terraform or CloudFormation
  • Helm or Kustomize
  • GitOps tooling (Flux or Argo)
  • CI/CD (GitHub Actions)
  • PostgreSQL

Nice to have

  • GPU-enabled clusters
  • Google Cloud
  • Azure

What the JD emphasized

  • 5+ years experience in DevOps, Site Reliability Engineering, Production Engineering, or equivalent field.

Other signals

  • building machine learning infrastructure
  • train, test, and deploy on large-scale datasets
  • ensure reliability of multi-cloud Kubernetes clusters and pipelines