Senior Software Engineer (ml), Data Plane

Amazon Amazon · Big Tech · IL, Tel Aviv · Software Development

Senior Software Engineer focused on optimizing the ML inference data plane for custom hardware, involving compute kernels, serving integration, and end-to-end model execution for large distributed models.

What you'd actually do

  1. Develop and optimize compute kernels for a custom ML accelerator architecture, targeting production-level performance for large language model inference.
  2. Implement and validate LLM architectures (decoder-only, mixture-of-experts) end-to-end - from PyTorch model definition through distributed execution on custom hardware.
  3. Integrate custom accelerator backends into open-source ML serving frameworks (vLLM, PyTorch), including scheduler extensions, memory management, and model parallelism.
  4. Build and maintain test infrastructure for model correctness validation across CPU, GPU, simulator, and hardware targets.
  5. Profile and optimize inference workloads - identify bottlenecks, instrument critical paths, and drive latency and throughput improvements from simulation through hardware bringup.

Skills

Required

  • C/C++
  • Linux systems knowledge
  • Machine Learning and LLM fundamentals
  • transformer architecture
  • training/inference lifecycles
  • optimization techniques
  • computer architecture
  • operating systems
  • parallel computing
  • developing compute kernels for GPUs, DSPs, or custom accelerators
  • owning and delivering complex software features end-to-end

Nice to have

  • JAX
  • PyTorch
  • vLLM
  • SGLang
  • Dynamo
  • TorchXLA
  • TensorRT
  • deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware
  • CUDA kernels
  • ML/low-level kernels
  • speculative decoding
  • KV cache optimization
  • LLM serving optimizations
  • distributed systems
  • collective communication
  • RDMA
  • high-speed interconnect programming
  • hardware simulation environments
  • model validation workflows
  • uses LLMs or code-generation agents as part of daily workflow

What the JD emphasized

  • production-level performance
  • custom hardware
  • end-to-end
  • low-level code
  • performance regressions
  • complex software features end-to-end

Other signals

  • custom hardware
  • large distributed models
  • low-level code optimization
  • inference performance