Engineering Manager, Site Reliability

Moveworks Moveworks · Enterprise · Bangalore, KA, India · Core Infrastructure

Engineering Manager, Site Reliability to architect and manage Moveworks AI cloud infrastructure and strategy. This role will focus on designing and operating resilient and secure cloud and infrastructure to ensure reliable product operation and rapid feature releases. Responsibilities include improving observability, debuggability, and overall system reliability, with a focus on AWS and related technologies.

What you'd actually do

  1. architecting and managing Moveworks AI cloud infrastructure and strategy
  2. designing and operating resilient and secure cloud and infrastructure that allows our products to operate reliably and our engineering teams to build and release customer facing features very rapidly
  3. Improve observability and reliability of Moveworks systems by managing/building monitoring and alerting infrastructure
  4. Improve debuggability - build / manage systems that help debug issues in production and analyze performance
  5. Architect, design, and execute projects to improve the reliability of our applications and systems

Skills

Required

  • Python
  • Go
  • AWS
  • Jenkins
  • Terraform
  • Ansible
  • Helm
  • distributed systems
  • monitoring and alerting
  • debuggability
  • performance analysis

Nice to have

  • Azure

What the JD emphasized

  • 8+ years of experience in software engineering / SRE with significant experience in Python / Go
  • 2+ years of experience leading projects and designing, analyzing, and troubleshooting distributed systems
  • Experience in managing/building infrastructure systems for deployment and management of workloads in AWS