Senior Software Engineer, Release Infra

Brex Brex · Fintech · São Paulo, Brazil · Engineering

Brex is seeking a Senior Software Engineer, Release Infra to design, build, and operate core systems for release, observability, and incident management. This role focuses on ensuring safe, fast, and reliable releases, and scaling infrastructure. The engineer will drive technical strategy, collaborate with cross-functional teams, and improve the end-to-end release process. Experience with CI/CD, distributed systems, cloud platforms, and SRE practices is required.

What you'd actually do

  1. Design, build, and maintain the release infrastructure that powers Brex’s deployment pipelines and incident workflows
  2. Drive technical strategy and architecture for release and observability systems, making them more scalable, reliable, and secure
  3. Collaborate with product, engineering, and operations partners to ensure Brex’s releases are safe, predictable, and low-friction
  4. Identify and deliver improvements to the end-to-end release process (from code merge to production) to reduce risk and cycle time
  5. Build and evolve tooling for observability and incident response, enabling fast detection, triage, and resolution

Skills

Required

  • Go
  • Java
  • Kotlin
  • Python
  • CI/CD
  • GitHub Actions
  • CircleCI
  • Buildkite
  • Argo
  • Spinnaker
  • Jenkins
  • AWS
  • GCP
  • Azure
  • Docker
  • Kubernetes
  • Terraform
  • CloudFormation
  • metrics
  • logs
  • tracing
  • SLIs/SLOs
  • error budgets
  • incident management
  • SQL
  • NoSQL

Nice to have

  • release engineering
  • observability
  • SRE
  • backend systems
  • infrastructure systems

What the JD emphasized

  • 7+ years of professional experience designing, building, and operating backend or infrastructure systems in production
  • Strong proficiency in backend programming languages (e.g., Go, Java, Kotlin, or Python) with a focus on reliability and performance
  • Hands-on experience with CI/CD and release pipelines (e.g., GitHub Actions, CircleCI, Buildkite, Argo, Spinnaker, Jenkins) including build, test, and deployment automation
  • Experience architecting and operating scalable, high-availability distributed systems on cloud platforms (e.g., AWS, GCP, Azure)
  • Deep familiarity with containerization and orchestration (e.g., Docker, Kubernetes) and infrastructure-as-code (e.g., Terraform, CloudFormation)
  • Experience designing and maintaining observability tooling (metrics, logs, tracing) and integrating it into incident response workflows
  • Strong understanding of reliability and SRE practices, including SLIs/SLOs, error budgets, and incident management best practices
  • Proven track record of improving release processes (e.g., reducing deployment risk, increasing deployment frequency, automating rollbacks)