Senior Infrastructure Engineer/sre

Cresta Cresta · Vertical AI · AB, Canada, Canada · Remote · Engineering

Senior Infrastructure Engineer/SRE role focused on building and advancing core infrastructure, including developer toolchains, Kubernetes clusters, metrics/logging, and infrastructure-as-code. A key responsibility is building machine learning infrastructure to support AI teams in training, testing, and deploying models.

What you'd actually do

  1. Developer Toolchain. Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
  2. Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
  3. Metrics, logging, analytics, and alerting for performance and security across all endpoints and applications.
  4. Infrastructure-as-code deployment tooling and supporting services on multiple cloud providers.
  5. Automate operations and engineering. Focus on automation so we can spend energy where it matters.
  6. Building machine learning infrastructure that enables AI teams to train, test, and deploy on large-scale datasets.

Skills

Required

  • DevOps
  • Site Reliability Engineering
  • Production Engineering
  • Golang
  • Python
  • Container security
  • Kubernetes
  • Helm
  • Kustomize
  • Terraform
  • CloudFormation
  • AWS
  • IAM
  • S3
  • EC2
  • EKS
  • PostgreSQL
  • GitOps
  • Flux
  • Argo
  • CI/CD
  • GitHub Actions

Nice to have

  • GPU-enabled clusters
  • Google Cloud
  • Azure

What the JD emphasized

  • 5+ years experience in DevOps, Site Reliability Engineering, Production Engineering, or equivalent field.
  • Deep proficiency with coding languages such as Golang or Python.
  • Deep familiarity with container-related security best practices.
  • Production experience working with Kubernetes, and a deep understanding of the Kubernetes ecosystem, including popular open-source tooling such as cert-manager or external-dns.
  • Production experience with IAC tools such as Terraform or CloudFormation.
  • Production experience working with AWS and services such as IAM, S3, EC2, and EKS.