What you'd actually do

Develop and optimize compute kernels for a custom ML accelerator architecture, targeting production-level performance for large language model inference.

Implement and validate LLM architectures (decoder-only, mixture-of-experts) end-to-end - from PyTorch model definition through distributed execution on custom hardware.

Integrate custom accelerator backends into open-source ML serving frameworks (vLLM, PyTorch), including scheduler extensions, memory management, and model parallelism.

Build and maintain test infrastructure for model correctness validation across CPU, GPU, simulator, and hardware targets.

Profile and optimize inference workloads - identify bottlenecks, instrument critical paths, and drive latency and throughput improvements from simulation through hardware bringup.

Skills

Required

C/C++
Linux systems knowledge
Machine Learning and LLM fundamentals
transformer architecture
training/inference lifecycles
optimization techniques
computer architecture
operating systems
parallel computing
developing compute kernels for GPUs, DSPs, or custom accelerators
owning and delivering complex software features end-to-end

Nice to have

JAX
PyTorch
vLLM
SGLang
Dynamo
TorchXLA
TensorRT
deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware
CUDA kernels
ML/low-level kernels
speculative decoding
KV cache optimization
LLM serving optimizations
distributed systems
collective communication
RDMA
high-speed interconnect programming
hardware simulation environments
model validation workflows
uses LLMs or code-generation agents as part of daily workflow

The MLIL DataPlane team is looking for a Senior Software Development Engineer to own the design and implementation of our inference data plane. We build the software that makes large models run efficiently on custom hardware - spanning model execution, memory management, data movement, and serving integration. Our work covers the full inference path: integrating serving engines with custom hardware, developing high-performance compute kernels, enabling efficient data movement, and driving models from early validation through production. We operate at frontier scale with large distributed models. This is a ground-up effort with rapidly evolving hardware and software. We need a senior IC who can write and optimize low-level code for custom hardware, validate model architectures end-to-end, build test and profiling infrastructure, and drive performance across the stack.

Key job responsibilities

Develop and optimize compute kernels for a custom ML accelerator architecture, targeting production-level performance for large language model inference.
Implement and validate LLM architectures (decoder-only, mixture-of-experts) end-to-end - from PyTorch model definition through distributed execution on custom hardware.
Integrate custom accelerator backends into open-source ML serving frameworks (vLLM, PyTorch), including scheduler extensions, memory management, and model parallelism.
Build and maintain test infrastructure for model correctness validation across CPU, GPU, simulator, and hardware targets.
Profile and optimize inference workloads - identify bottlenecks, instrument critical paths, and drive latency and throughput improvements from simulation through hardware bringup.
Own features end-to-end: from design through implementation, testing, and integration into the broader software stack.
Contribute to CI/CD pipelines that gate model and kernel changes on correctness and performance regressions.
Mentor engineers, drive design reviews, and raise the engineering bar across the team.

Basic Qualifications

Bachelor's degree in computer science or equivalent
7+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Knowledge of Machine Learning and LLM fundamentals, including transformer architecture, training/inference lifecycles, and optimization techniques
Knowledge of computer architecture, operating systems, and parallel computing
Strong proficiency in C/C++
Strong Linux systems knowledge
Experience developing compute kernels for GPUs, DSPs, or custom accelerators
Proven track record of owning and delivering complex software features end-to-end

Preferred Qualifications

Knowledge of ML frameworks including JAX, PyTorch, vLLM, SGLang, Dynamo, TorchXLA, and TensorRT
Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware, or experience with CUDA kernels or ML/low-level kernels
Familiarity with speculative decoding, KV cache optimization, or other LLM serving optimizations
Experience with distributed systems - collective communication, RDMA, or high-speed interconnect programming
Experience with hardware simulation environments and model validation workflows
Demonstrated early adopter of AI-assisted development tools - uses LLMs or code-generation agents as part of daily workflow

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Key job responsibilities

Develop and optimize compute kernels for a custom ML accelerator architecture, targeting production-level performance for large language model inference.
Implement and validate LLM architectures (decoder-only, mixture-of-experts) end-to-end - from PyTorch model definition through distributed execution on custom hardware.
Integrate custom accelerator backends into open-source ML serving frameworks (vLLM, PyTorch), including scheduler extensions, memory management, and model parallelism.
Build and maintain test infrastructure for model correctness validation across CPU, GPU, simulator, and hardware targets.
Profile and optimize inference workloads - identify bottlenecks, instrument critical paths, and drive latency and throughput improvements from simulation through hardware bringup.
Own features end-to-end: from design through implementation, testing, and integration into the broader software stack.
Contribute to CI/CD pipelines that gate model and kernel changes on correctness and performance regressions.
Mentor engineers, drive design reviews, and raise the engineering bar across the team.

Basic Qualifications

Bachelor's degree in computer science or equivalent
7+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Knowledge of Machine Learning and LLM fundamentals, including transformer architecture, training/inference lifecycles, and optimization techniques
Knowledge of computer architecture, operating systems, and parallel computing
Strong proficiency in C/C++
Strong Linux systems knowledge
Experience developing compute kernels for GPUs, DSPs, or custom accelerators
Proven track record of owning and delivering complex software features end-to-end

Preferred Qualifications

Knowledge of ML frameworks including JAX, PyTorch, vLLM, SGLang, Dynamo, TorchXLA, and TensorRT
Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware, or experience with CUDA kernels or ML/low-level kernels
Familiarity with speculative decoding, KV cache optimization, or other LLM serving optimizations
Experience with distributed systems - collective communication, RDMA, or high-speed interconnect programming
Experience with hardware simulation environments and model validation workflows
Demonstrated early adopter of AI-assisted development tools - uses LLMs or code-generation agents as part of daily workflow

Senior Software Engineer (ml), Data Plane

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications