System Development Engineering, Aws Ec2 Nitro Team

Amazon Amazon · Big Tech · Santa Clara, CA · Software Development

This role is for a System Development Engineer on the AWS EC2 Nitro team, focusing on building and maintaining the infrastructure for EC2 machine learning platforms. The engineer will be responsible for operational health, build/release systems, and ensuring service uptime, while also developing tools to streamline operations and improve CI/CD processes. The role involves mentoring junior engineers and contributing to the overall operational excellence of the service.

What you'd actually do

  1. Developing tools to streamline operations as we scale to reduce operational load and improve our overall operational posture. Influencing & driving operational excellence and best practices within the organization.
  2. Identifying ways to increase the automate and improve our test infrastructure and make our CI/CD more robust and flexible
  3. Playing a key role in investigating and recommending best practices for maintaining and improving code quality, fleet health, and security & reliability of our service.
  4. Growing our talent through actively mentoring junior engineers, improving their skills, their knowledge of our systems, and their ability to get things done. Sharing your knowledge with wider teams and writing clear and concise documentation to allow other engineers to get the most out of the service and tools.

Skills

Required

  • 2+ years of non-internship professional software development experience
  • 2+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
  • 7+ years of administrative experience in networking, storage systems, operating systems and hands-on systems engineering experience
  • Knowledge of systems engineering fundamentals (networking, storage, operating systems)
  • Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby

Nice to have

  • Experience with PowerShell (preferred), Python, Ruby, or Java
  • Experience working in an Agile environment using the Scrum methodology
  • Experience in automating, deploying, and supporting large-scale infrastructure

What the JD emphasized

  • developing the next generation of EC2 machine learning platforms
  • maintaining the operational health of the service
  • maintaining build & release systems
  • ensuring maximum up-time for our developers and customers
  • developing tools to streamline operations
  • automate and improve our test infrastructure
  • CI/CD more robust and flexible
  • maintaining and improving code quality
  • fleet health
  • security & reliability of our service
  • mentoring junior engineers