Sr. Systems Development Engineer (aws Generative AI & ML Servers), Aws Hw Engineering

Amazon Amazon · Big Tech · Austin, TX · Systems, Quality, & Security Engineering

This role focuses on building and operating AWS cloud infrastructure for AI training and inference, specifically targeting high-performance and scalable solutions for large language models. The engineer will work on server designs, system-level debugging, and implementing automation solutions, including agentic workflows and AI-driven tools, to enhance the productivity of other engineers and influence AI implementation and core architecture.

What you'd actually do

  1. You will be a technical leader solving complex architectural problems which may not defined before hand.
  2. You will be owning the teams systems and work proactively in identifying deficiencies, writing tactical code to solve issues before they impact customers, and working with your team to scale the solution.
  3. You will decompose big difficult server system testability, reliability and diagnosis problems into straightforward tasks, components or features that you will lead to deliver yourself and through others in parallel.
  4. You will use combination of hardware, software, system designs, x86 architecture, processes, diagnosis and operations knowledge.
  5. In this role you will create automation through agentic workflows.
  6. You’ll develop smart automation solutions, implement AI-driven tools and workflows and be part of AI transformation.

Skills

Required

  • Systems Development Engineering
  • AWS cloud infrastructure
  • AI/ML workloads
  • HPC workloads
  • Server hardware design
  • Software development
  • System debugging
  • Automation
  • Agentic workflows
  • AI-driven tools

Nice to have

  • x86 architecture
  • Full technical stack knowledge (baremetal to userland)

What the JD emphasized

  • building the backbone of Generative AI cloud at AWS
  • designing, delivering and operating AWS cloud offerings that enable high performance and scalability in AI/ML and HPC workloads
  • direct impact on AI-powered innovation
  • implementing automation solutions that directly enhance the productivity of our engineers
  • influence both AI implementation and core architecture
  • create automation through agentic workflows
  • implement AI-driven tools and workflows

Other signals

  • building the backbone of Generative AI cloud at AWS
  • designing, delivering and operating AWS cloud offerings that enable high performance and scalability in AI/ML and HPC workloads
  • direct impact on AI-powered innovation
  • creating automation through agentic workflows
  • implement AI-driven tools and workflows