What you'd actually do

Work directly with the most important and exciting Startup customers in the GenAI model training and inference space, helping them adopt and scale large-scale workloads (e.g., frontier models, models, multi-modal systems, optimization) on AWS

Advise customers on AI infrastructure requirements and trade-offs including GPU/Trainium selection, cluster topology, storage, networking (EFA), and cost optimization for training and inference

Provide deep technical guidance on inference optimization model serving architectures (self-managed on EKS, SageMaker endpoints, Sagemaker Hyperpod Serving), batching strategies, quantization, model parallelism, and latency/throughput tradeoffs

Provide deep technical guidance on training optimization distributed training strategies, framework selection (PyTorch, JAX, NeMo), SageMaker HyperPod, Slurm/PCS integration, checkpointing, and data pipeline design

Help customers understand and apply model optimization techniques fine-tuning approaches (LoRA, QLoRA, full fine-tuning), RLHF/DPO, knowledge distillation, and efficient serving techniques (vLLM, TensorRT-LLM, Triton)

Skills

Required

Deep understanding of AI infrastructure (GPU, Trainium, networking)
Expertise in model optimization for inference and distributed training
Experience with model serving architectures (EKS, SageMaker endpoints)
Knowledge of distributed training strategies and frameworks (PyTorch, JAX, NeMo)
Familiarity with fine-tuning approaches (LoRA, QLoRA, full fine-tuning)
Experience with efficient serving techniques (vLLM, TensorRT-LLM, Triton)
GPU and accelerator profiling
Understanding of hardware layer (GPU architectures, NVLink, EFA networking)
Experience with orchestration layer (EKS/Kubernetes, SageMaker HyperPod, Slurm/PCS)
Knowledge of framework/model layer (distributed training, inference frameworks)
Experience with profiling and debugging tools (NVIDIA Nsight, DCGM, PyTorch Profiler)

Nice to have

Experience with RLHF/DPO
Experience with SageMaker HyperPod Serving
Experience with Slurm/PCS integration
Experience with AWS compute, networking, and ML platform services

What the JD emphasized

AI infrastructure

model training

inference optimization

large-scale models

frontier AI model builders

optimization of models

inference serving

distributed training at scale

large-scale workloads

frontier models

multi-modal systems

inference optimization

model serving architectures

training optimization

distributed training strategies

model optimization techniques

fine-tuning approaches

efficient serving techniques

deep infrastructure and systems background

hands-on ML/AI expertise

large-scale training

systematic performance tuning

Do you want to help define the future of technology on AWS Generative AI as part of the Specialist Solutions Architect team in the Go-To-Market (GTM) Startup team? Are you passionate about AI infrastructure and helping customers understand the complexities of training and serving large-scale models? You will be part of the core Specialist Organization focused on Startup Customers GenAI and Go-to-Market (GTM) team, focused on AI infrastructure for model training and inference optimization. You will be responsible for defining, building, and deploying targeted strategies to accelerate adoption of AWS compute, networking, and ML platform services with lighthouse Frontier AI model builders across Startups companies in different industry verticals. This role sits at the intersection of AI infrastructure architecture and model optimization — you will help customers understand hardware requirements and complexity (GPU, Trainium, networking), while also providing deep expertise in optimization of models and techniques for both inference serving and distributed training at scale. AWS Specialist Solutions Architects (SSAs) are technologists with deep domain-specific expertise, able to address advanced concepts and feature designs. As part of the AWS sales organization, SSAs work with customers who have complex challenges that require expert-level knowledge to solve. SSAs craft scalable, flexible, and resilient technical architectures that address those challenges.

Key job responsibilities

Work directly with the most important and exciting Startup customers in the GenAI model training and inference space, helping them adopt and scale large-scale workloads (e.g., frontier models, models, multi-modal systems, optimization) on AWS
Advise customers on AI infrastructure requirements and trade-offs including GPU/Trainium selection, cluster topology, storage, networking (EFA), and cost optimization for training and inference
Provide deep technical guidance on inference optimization model serving architectures (self-managed on EKS, SageMaker endpoints, Sagemaker Hyperpod Serving), batching strategies, quantization, model parallelism, and latency/throughput tradeoffs
Provide deep technical guidance on training optimization distributed training strategies, framework selection (PyTorch, JAX, NeMo), SageMaker HyperPod, Slurm/PCS integration, checkpointing, and data pipeline design
Guide customers on GPU and accelerator profiling identifying bottlenecks (compute, memory, I/O), optimizing utilization, and tuning system-level performance
Help customers understand and apply model optimization techniques fine-tuning approaches (LoRA, QLoRA, full fine-tuning), RLHF/DPO, knowledge distillation, and efficient serving techniques (vLLM, TensorRT-LLM, Triton)
Help Go-To-Market Specialist define and drive strategy on assets that impact growth through market sizing, building an opportunity pipeline, creating technical content to train field teams, and establishing thought leadership
Develop demos, proof-of-concepts, reference architectures, and benchmarks that demonstrate AWS infrastructure value proposition for GenAI workloads
Collaborate with product teams (EC2, Trainium/Inferentia, SageMaker, EKS, PCS, EC2) to shape product vision, prioritize features, and represent the voice of the customer
Work with account teams, research scientists, ISVs, framework communities, and model providers to drive implementations and accelerate innovation

A day in the life

As the ideal candidate, you possess a deep infrastructure and systems background combined with hands-on ML/AI expertise that enables you to lead engagements with frontier AI labs, startups, and large enterprises. You understand:

The hardware layer: GPU architectures (NVIDIA A100/H100/B200, AWS Trainium/Inferentia), NVLink, EFA networking, storage hierarchies (FSx for Lustre, S3), and how they interact at scale
The orchestration layer: How to run large-scale training at least on one or more of EKS/Kubernetes, SageMaker HyperPod, Slurm/PCS — including cluster management, job scheduling, fault tolerance, and elastic scaling
The framework/model layer: Distributed training paradigms, inference frameworks (vLLM, llm-d, Triton, SGlang, etc), and optimization techniques (quantization, speculative decoding, KV-cache optimization)
The profiling and debugging layer: GPU profiling tools (NVIDIA Nsight, DCGM, PyTorch Profiler), identifying compute/memory/communication bottlenecks, and systematic performance tuning You have the technical depth to articulate the benefits of AWS infrastructure to ML engineers, platform engineers, and C-Level executives. You are adept at working across AWS teams (product, solutions architecture, sales, marketing, professional services) and externally with customers, partners, and the open-source ML community.

About the team About AWS Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.

Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses.

Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness.

Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.

Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.

Basic Qualifications

Experience conveying complex technical concepts to both technical and business audiences
8+ years of experience in technology domain areas (e.g., systems engineering, cloud infrastructure, HPC, ML/AI, distributed computing)
3+ years of experience designing, implementing, or consulting on large-scale AI/ML infrastructure with hands-on experience on GPU-based computing, ML training infrastructure, and inference serving systems

Preferred Qualifications

Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware, or experience with CUDA kernels or ML/low-level kernels
Experience with vLLM, SGLang, TensorRT or similar platforms in production environments, or experience in performant kernel development (CUTLASS, FlashInfer)
Experience with container orchestration for ML: EKS, Kubernetes operators for ML KubeRay, Karpenter, Keda, K8/DRA
Experience with HPC schedulers and managed platforms: Slurm, AWS PCS (Parallel Computing Service), SageMaker HyperPod
Experience with fine-tuning techniques: LoRA, QLoRA, RLHF, DPO, knowledge distillation, Quantization, KV optimization

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits.

USA, NY, New York - 169,000.00 - 228,600.00 USD annually USA, VA, Herndon - 153,600.00 - 207,800.00 USD annually

Key job responsibilities

Work directly with the most important and exciting Startup customers in the GenAI model training and inference space, helping them adopt and scale large-scale workloads (e.g., frontier models, models, multi-modal systems, optimization) on AWS
Advise customers on AI infrastructure requirements and trade-offs including GPU/Trainium selection, cluster topology, storage, networking (EFA), and cost optimization for training and inference
Provide deep technical guidance on inference optimization model serving architectures (self-managed on EKS, SageMaker endpoints, Sagemaker Hyperpod Serving), batching strategies, quantization, model parallelism, and latency/throughput tradeoffs
Provide deep technical guidance on training optimization distributed training strategies, framework selection (PyTorch, JAX, NeMo), SageMaker HyperPod, Slurm/PCS integration, checkpointing, and data pipeline design
Guide customers on GPU and accelerator profiling identifying bottlenecks (compute, memory, I/O), optimizing utilization, and tuning system-level performance
Help customers understand and apply model optimization techniques fine-tuning approaches (LoRA, QLoRA, full fine-tuning), RLHF/DPO, knowledge distillation, and efficient serving techniques (vLLM, TensorRT-LLM, Triton)
Help Go-To-Market Specialist define and drive strategy on assets that impact growth through market sizing, building an opportunity pipeline, creating technical content to train field teams, and establishing thought leadership
Develop demos, proof-of-concepts, reference architectures, and benchmarks that demonstrate AWS infrastructure value proposition for GenAI workloads
Collaborate with product teams (EC2, Trainium/Inferentia, SageMaker, EKS, PCS, EC2) to shape product vision, prioritize features, and represent the voice of the customer
Work with account teams, research scientists, ISVs, framework communities, and model providers to drive implementations and accelerate innovation

A day in the life

The hardware layer: GPU architectures (NVIDIA A100/H100/B200, AWS Trainium/Inferentia), NVLink, EFA networking, storage hierarchies (FSx for Lustre, S3), and how they interact at scale
The orchestration layer: How to run large-scale training at least on one or more of EKS/Kubernetes, SageMaker HyperPod, Slurm/PCS — including cluster management, job scheduling, fault tolerance, and elastic scaling
The framework/model layer: Distributed training paradigms, inference frameworks (vLLM, llm-d, Triton, SGlang, etc), and optimization techniques (quantization, speculative decoding, KV-cache optimization)
The profiling and debugging layer: GPU profiling tools (NVIDIA Nsight, DCGM, PyTorch Profiler), identifying compute/memory/communication bottlenecks, and systematic performance tuning You have the technical depth to articulate the benefits of AWS infrastructure to ML engineers, platform engineers, and C-Level executives. You are adept at working across AWS teams (product, solutions architecture, sales, marketing, professional services) and externally with customers, partners, and the open-source ML community.

Basic Qualifications

Experience conveying complex technical concepts to both technical and business audiences
8+ years of experience in technology domain areas (e.g., systems engineering, cloud infrastructure, HPC, ML/AI, distributed computing)
3+ years of experience designing, implementing, or consulting on large-scale AI/ML infrastructure with hands-on experience on GPU-based computing, ML training infrastructure, and inference serving systems

Preferred Qualifications

Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware, or experience with CUDA kernels or ML/low-level kernels
Experience with vLLM, SGLang, TensorRT or similar platforms in production environments, or experience in performant kernel development (CUTLASS, FlashInfer)
Experience with container orchestration for ML: EKS, Kubernetes operators for ML KubeRay, Karpenter, Keda, K8/DRA
Experience with HPC schedulers and managed platforms: Slurm, AWS PCS (Parallel Computing Service), SageMaker HyperPod
Experience with fine-tuning techniques: LoRA, QLoRA, RLHF, DPO, knowledge distillation, Quantization, KV optimization

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

USA, NY, New York - 169,000.00 - 228,600.00 USD annually USA, VA, Herndon - 153,600.00 - 207,800.00 USD annually

Sr Genai Infra Specialist Sa, Aws Wwso Startup

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Basic Qualifications

Preferred Qualifications

Basic Qualifications

Preferred Qualifications