Machine Learning Israel (MLIL), as part of Annapurna Labs / Amazon, is hiring a Lab Engineer to own and operate the labs that powers the bring-up and validation of our next-generation ML training and inference racks. In this role you will build, maintain, and continuously evolve the lab infrastructure — from bench setups to server racks — used daily by HW, FW, and SW engineers. You will be the go-to person for delivering working, instrumented setups that the R&D teams can pick up and run with.
Key job responsibilities • Own the MLIL hardware lab in the Tel-Aviv office: physical layout, power and cooling budget, network topology, cabling, asset tracking, and day-to-day operations. • Build, configure, and connect new lab setups for HW, FW, and SW engineers — including Servers, GPU sleds, PCIe switches, retimers, NICs, and DRAM modules — and deliver them ready for R&D use. • Administer and maintain Linux-based servers and systems, including installation, configuration, and optimization • Manage and configure network services such as DHCP, PXE, and other critical infrastructure components. • Run sanity tests on every delivered setup — boot, PCIe enumeration, basic DRAM check, network reachability — so R&D teams pick up a known-good baseline and can focus on their work. • Write and maintain automation scripts (Python / Bash) for repetitive lab tasks — power cycling, log collection, provisioning, imaging, test-harness setup. • Procure, inventory, and manage lab equipment: bench PSUs, scopes, protocol analyzers, thermal chambers, JTAG debuggers, cables, and fixtures. • Triage lab-level issues (power, network, cabling, imaging) to unblock R&D fast; escalate deep HW / FW / SW debug (e.g., RDMA / GPU / EFA internals) to the relevant specialist teams.
Basic Qualifications
- 3+ years experience as a System-Admin/Lab Engineer or in a similar role
- Knowledge of Linux operating systems and server administration
- Solid understanding of networking fundamentals — Ethernet, TCP/IP, link-layer debug, switch / NIC configuration.
Preferred Qualifications
- Proven hands-on experience with lab instrumentation: scopes, logic analyzers, protocol analyzers, bench PSUs, JTAG / BMC debug.
- B.Sc in Electrical / Electronics / Computer Engineering, or a Practical Engineer diploma (הנדסאי) with hands-on experience.
- Solid understanding of PCIe — enumeration, link training, lane configuration, error reporting (AER), and common debug flows.
- Experience with BMC / BIOS / UEFI debug, IPMI, Redfish.
- Experience with high-speed serial debug — SerDes, equalization, eye diagrams, BER testing.
- Proficient in Python / Bash automation and willing to write production-grade lab tooling.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.