Software Development Engineer, System and Embedded Pcie and Neuron Link

Amazon Amazon · Big Tech · Austin, TX · Software Development

This role focuses on developing mission-critical software for the interconnect (PCIe and Neuron Link) of Amazon's next-generation machine learning platforms and custom AI servers (Trainium). The engineer will collaborate with EC2 teams and manufacturing partners on system integration and qualification, working with languages like C, C++, Lua, Bash, and Python to develop device drivers and automation software. The core responsibilities involve enabling and monitoring accelerated compute servers for ML workloads, bringing up new hardware, and developing automated test and deployment pipelines.

What you'd actually do

  1. Develop mission-critical software that powers Annapurna Labs' next-generation machine learning platforms' interconnect (PCIe and Neuron Link)
  2. Collaborate with EC2 teams and manufacturing partners to ensure seamless system integration
  3. Drive end-to-end qualification processes for new hardware implementations
  4. develop device drivers, and develop automation software
  5. develop automated software test and deployment pipelines to ensure software quality, compatibility, and upgradeability

Skills

Required

  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Bachelor's degree in computer science or equivalent
  • C
  • C++
  • Lua
  • Bash
  • Python
  • device drivers
  • automation software

Nice to have

  • Experience with PCIe subsystems or controllers. Experience can range from supporting PCIe devices to programming controller firmware to device driver implementation.

What the JD emphasized

  • mission-critical software
  • next-generation machine learning platforms
  • custom silicon
  • PCIe subsystems or controllers