What you'd actually do

Architect and build scalable ML infrastructure powering LLM training and post-training workflows, including supervised fine-tuning, reinforcement learning, and continuous learning from live traffic

Transform real-world customer interactions into high-quality training signals, enabling continuous model improvement and better customer experiences

Build and optimize post-training and RL systems, including reward modeling, policy optimization, data collection loops.

Drive experimentation and iteration velocity by building tooling and frameworks that enable rapid hypothesis testing, signal validation, and model quality improvements

Partner closely with applied scientists to translate frontier techniques (e.g., RLHF, agentic workflows, multi-turn optimization) into reliable, production-grade systems

Skills

Required

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
Experience with vLLM, SGLang, TensorRT or similar platforms in production environments
Experience with CUDA kernels or ML/low-level kernels

Nice to have

5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution

We are building foundational LLMs for Amazon Stores that fuse world knowledge with deep e-commerce understanding to power next-generation shopping experiences. These systems continuously learn from real-world customer interactions to become more helpful, personalized, and context-aware over time.

We are looking for builders who are passionate about large-scale systems, AI innovation, and customer impact. You will work at the intersection of distributed systems, machine learning infrastructure, and science to bring frontier research—especially in post-training and reinforcement learning—into production at Amazon scale.

Key job responsibilities

Architect and build scalable ML infrastructure powering LLM training and post-training workflows, including supervised fine-tuning, reinforcement learning, and continuous learning from live traffic
Transform real-world customer interactions into high-quality training signals, enabling continuous model improvement and better customer experiences
Build and optimize post-training and RL systems, including reward modeling, policy optimization, data collection loops.
Drive experimentation and iteration velocity by building tooling and frameworks that enable rapid hypothesis testing, signal validation, and model quality improvements
Partner closely with applied scientists to translate frontier techniques (e.g., RLHF, agentic workflows, multi-turn optimization) into reliable, production-grade systems
Own systems end-to-end, including design, implementation, deployment, observability, and operational excellence
Raise the engineering bar through technical leadership, design reviews, and mentorship, influencing best practices across the organization

Basic Qualifications

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
Experience with vLLM, SGLang, TensorRT or similar platforms in production environments
Experience with CUDA kernels or ML/low-level kernels

Preferred Qualifications

5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, CA, Palo Alto - 193,300.00 - 261,500.00 USD annually USA, WA, Seattle - 168,100.00 - 227,400.00 USD annually

Key job responsibilities

Architect and build scalable ML infrastructure powering LLM training and post-training workflows, including supervised fine-tuning, reinforcement learning, and continuous learning from live traffic
Transform real-world customer interactions into high-quality training signals, enabling continuous model improvement and better customer experiences
Build and optimize post-training and RL systems, including reward modeling, policy optimization, data collection loops.
Drive experimentation and iteration velocity by building tooling and frameworks that enable rapid hypothesis testing, signal validation, and model quality improvements
Partner closely with applied scientists to translate frontier techniques (e.g., RLHF, agentic workflows, multi-turn optimization) into reliable, production-grade systems
Own systems end-to-end, including design, implementation, deployment, observability, and operational excellence
Raise the engineering bar through technical leadership, design reviews, and mentorship, influencing best practices across the organization

Basic Qualifications

5+ years of non-internship professional software development experience
5+ years of programming with at least one software programming language experience
5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
Experience as a mentor, tech lead or leading an engineering team
Experience with vLLM, SGLang, TensorRT or similar platforms in production environments
Experience with CUDA kernels or ML/low-level kernels

Preferred Qualifications

5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Bachelor's degree in computer science or equivalent
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

USA, CA, Palo Alto - 193,300.00 - 261,500.00 USD annually USA, WA, Seattle - 168,100.00 - 227,400.00 USD annually

Senior Software Development Engineer , Stores Foundational AI - Rufus

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications