ML Software Engineer, Data Plane

Amazon Amazon · Big Tech · IL, Tel Aviv · Software Development

ML Software Engineer focused on the inference data plane, optimizing software for custom hardware to run large models efficiently. Responsibilities include kernel optimization, model architecture integration, serving framework integration, and performance profiling for LLM inference.

What you'd actually do

  1. Develop and optimize compute kernels for a custom ML accelerator architecture, targeting production-level performance for large language model inference.
  2. Implement and validate LLM architectures (decoder-only, mixture-of-experts) end-to-end - from PyTorch model definition through distributed execution on custom hardware.
  3. Integrate custom accelerator backends into open-source ML serving frameworks (vLLM, PyTorch), including scheduler extensions, memory management, and model parallelism.
  4. Build and maintain test infrastructure for model correctness validation across CPU, GPU, simulator, and hardware targets.
  5. Profile and optimize inference workloads - identify bottlenecks, instrument critical paths, and drive latency and throughput improvements from simulation through hardware bringup.

Skills

Required

  • 3+ years of full software development life cycle
  • coding standards
  • code reviews
  • source control management
  • build processes
  • testing
  • operations experience
  • Knowledge of computer architecture
  • operating systems
  • parallel computing

Nice to have

  • Knowledge of Machine Learning and LLM fundamentals
  • transformer architecture
  • training/inference lifecycles
  • optimization techniques
  • ML frameworks including JAX, PyTorch, vLLM, SGLang, Dynamo, TorchXLA, and TensorRT
  • developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware

What the JD emphasized

  • production-level performance
  • custom hardware
  • large language model inference
  • end-to-end
  • distributed execution on custom hardware
  • open-source ML serving frameworks
  • model parallelism
  • hardware targets
  • hardware bringup

Other signals

  • optimize low-level code for custom hardware
  • validate model architectures end-to-end
  • build test and profiling infrastructure
  • drive performance across the stack