Software Development Engineer II - Amazon Msk, Managed Streaming Kafka (msk), Msk Hoover

Amazon Amazon · Big Tech · Seattle, WA · Software Development

This role focuses on building and operating automation for a large-scale managed streaming Kafka service (Amazon MSK). The primary goal is to ensure the fleet of hundreds of thousands of stateful hosts remains healthy, secure, and available, making infrastructure maintenance invisible to customers. Responsibilities include designing and building systems for host maintenance, automated health detection and remediation, safe rollout/rollback mechanisms, and end-to-end service ownership including debugging and reducing manual operational effort. The team increasingly uses generative AI to strengthen these mechanisms, but the core role is infrastructure engineering and automation.

What you'd actually do

  1. Design, build, and operate automation that patches and maintains hundreds of thousands of stateful hosts, keeping fleet maintenance invisible to customers.
  2. Build systems that automatically detect unhealthy hosts and remediate them, balancing fast recovery against avoiding needless disruption.
  3. Develop rollout and rollback mechanisms that keep the blast radius of any change small at fleet scale, and that let changes be tested before they reach customers and reversed if something goes wrong.
  4. Own your services end to end: take part in on-call, debug production issues, and continually reduce the manual effort needed to operate the fleet.
  5. Write design documents, collaborate with engineers across MSK, and raise the engineering bar through design and code reviews.

Skills

Required

  • 3+ years of non-internship professional software development experience
  • 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience programming with at least one software programming language

Nice to have

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Bachelor's degree in computer science or equivalent