Senior Software Development Engineer (aws Ml), Machine Learning Israel (mlil) — Flow Sub-team (fleet Lifecycle & Operational Workflows)

Amazon Amazon · Big Tech · IL, Tel Aviv · Software Development

Senior Software Development Engineer for Amazon's MLIL FLOW team, focusing on systems software for next-generation ML accelerator servers. Responsibilities include leading the design and implementation of hardware validation, diagnostics, and operational software for these servers, from silicon bring-up to fleet-scale deployment. The role involves working with hardware and other teams, debugging complex issues, building data pipelines, and mentoring engineers.

What you'd actually do

  1. Lead the architecture and implementation of hardware validation and diagnostic software for new ML acceleration platforms.
  2. Drive technical direction for PCIe validation, power/thermal diagnostics, and stress-testing frameworks that run across manufacturing, vetting, and production environments.
  3. Own subsystems end-to-end: from design through implementation, testing, deployment, and operational excellence at fleet scale.
  4. Work with Hardware, Manufacturing, EC2 teams to create coordinated software packages that enable both qualification and rapid deployment.
  5. Debug and root-cause complex hardware/software interaction failures on first silicon and production fleet returns; drive root-cause to closure.

Skills

Required

  • Bachelor's degree or above in Computer Science, Computer Engineering, Electrical Engineering, or related fields
  • At least 8 years of professional software development experience

Nice to have

  • Experience with hardware bring-up, ASIC/FPGA validation, or manufacturing test development
  • Proficiency in scripting languages (Python, Lua) for test automation and data analysis
  • Track record of cross-team influence and delivering results through others
  • Experience building data pipelines, ETL systems, or fleet-scale monitoring/dashboarding
  • Demonstrated project-management experience leading multiple R&D initiatives in parallel
  • Experience with PCIe, or high-speed interconnect validation and debugging

What the JD emphasized

  • At least 8 years of professional software development experience