Software Development Engineer, Aws Incident Tooling & Response

Amazon Amazon · Big Tech · D, Ireland +1 · Software Development

Software Development Engineer role focused on building the next generation of incident management tooling for AWS, utilizing agentic AI development practices to create a unified, highly available, and performant platform for coordinating response during critical incidents. The role involves owning significant portions of the service architecture, including the data layer, authorization system, and API model, and integrating with automation systems.

What you'd actually do

  1. Design and implement service components for a multi-region, multi-tenant incident management platform.
  2. Own subsystems including the data layer, authorization, and API surface.
  3. Build integrations with incident automation systems, conference bridge providers, and downstream event consumers.
  4. Drive technical design decisions, balancing reliability, performance, and delivery speed.
  5. Participate in operational support and ensure the service is resilient during the incidents it is designed to manage.

Skills

Required

  • Experience (non-internship) in professional software development
  • Experience designing, building, operating, and managing large-scale distributed systems or web services
  • Experience using generative AI tools to accelerate engineering workflows

Nice to have

  • Experience or certifications in API design, cloud architecture/deployment, service-oriented architecture, mobile development, performance optimization, databases design and related fields
  • Experience with authorization systems (IAM, RBAC, or attribute-based access control)
  • Experience mentoring other engineers
  • Experience writing technical design documents and driving alignment across teams

What the JD emphasized

  • agentic AI development
  • agentic AI development practices
  • AWS global infrastructure
  • high-severity incidents
  • AWS services are degraded

Other signals

  • agentic AI development
  • unified platform
  • high-severity incidents
  • AWS global infrastructure