Sr Software Development Engineer, System and Embedded Pcie and Neuron Link

Amazon Amazon · Big Tech · Austin, TX · Software Development

This role is for a Senior Software Development Engineer focused on developing mission-critical software for the interconnect (PCIe and Neuron Link) of Amazon's next-generation machine learning platforms, specifically the custom Trainium AI servers. The responsibilities include system integration with EC2 teams, driving qualification processes, developing device drivers, and automation software for hardware bring-up and monitoring of ML workloads in AWS data centers. While the role supports ML platforms, the core craft is in system and embedded software development for hardware interconnects, not direct AI/ML model development or deployment.

What you'd actually do

  1. Develop mission-critical software that powers Annapurna Labs' next-generation machine learning platforms' interconnect (PCIe and Neuron Link)
  2. Collaborate with EC2 teams and manufacturing partners to ensure seamless system integration
  3. Drive end-to-end qualification processes for new hardware implementations

Skills

Required

  • 5+ years of non-internship professional software development experience
  • 5+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Experience as a mentor, tech lead or leading an engineering team
  • operating systems
  • Linux architecture
  • embedded systems
  • control systems
  • C
  • C++
  • Lua
  • Bash
  • Python
  • device drivers
  • automation software

Nice to have

  • Bachelor's degree in computer science or equivalent

What the JD emphasized

  • mission-critical software
  • next-generation machine learning platforms
  • custom Trainium AI servers
  • customer Machine Learning workloads