Engineering Manager, Cloud Infrastructure

Brex Brex · Fintech · New York, NY +2 · Engineering

Engineering Manager for Cloud Infrastructure at Brex, a fintech company. The role focuses on leading a senior engineering team to manage and scale the foundational AWS ecosystem, with a primary near-term mission of driving a Business Continuity and Disaster Recovery (BCDR) initiative. Responsibilities include people management, technical stewardship of cloud infrastructure (Kubernetes, RDS, networking), cross-functional collaboration, and operational excellence.

What you'd actually do

  1. Manage and support a senior team of 6 ICs, fostering a culture of high ownership, autonomy, and technical excellence.
  2. Oversee the execution of the multi-year BCDR initiative, ensuring multi-region infrastructure readiness and automated failover capabilities.
  3. Maintain and scale Brex’s core cloud infrastructure, including Kubernetes (EKS), RDS/Aurora/Postgres, networking (VPC), and AWS service integrations.
  4. Partner with product and engineering teams across Brex to drive adoption of new infrastructure capabilities and ensure successful migrations.
  5. Improve team hygiene by establishing and improving best practices for work tracking (Linear), documentation, and incident response.

Skills

Required

  • 3+ years of experience directly managing software or infrastructure engineering teams.
  • Deep hands-on experience with cloud infrastructure foundations, specifically AWS, Kubernetes, and Postgres.
  • Proven ability to design and implement complex infrastructure projects such as multi-region deployments or disaster recovery frameworks.
  • Strong communication skills with the ability to influence technical decisions across multiple autonomous teams.
  • Ability to dive deep into technical issues while maintaining a high-level strategic view of organizational goals.
  • 5+ years of experience in software or systems engineering.

Nice to have

  • Experience with Infrastructure as Code tools, specifically Terraform.
  • Background in managing cloud infrastructure cost optimization programs.
  • Experience with our tech stack components: Go, Kotlin, Python, or Elixir.
  • Passion for leveraging AI and LLM-assisted workflows to increase engineering velocity.

What the JD emphasized

  • high-priority Business Continuity and Disaster Recovery (BCDR) initiative
  • multi-region failover
  • infrastructure-wide resilience
  • senior team
  • deep technical involvement
  • multi-region infrastructure readiness
  • automated failover capabilities
  • core cloud infrastructure
  • multi-region deployments
  • disaster recovery frameworks