Sr Genai Infra Specialist Sa, Aws Wwso Startup

Amazon Amazon · Big Tech · NY +1 · Solutions Architect

Senior GenAI Infrastructure Specialist SA for AWS WWSO Startup team, focusing on AI infrastructure for model training and inference optimization. The role involves advising startup customers on hardware, optimization techniques, and deploying strategies for large-scale AI workloads on AWS.

What you'd actually do

  1. Work directly with the most important and exciting Startup customers in the GenAI model training and inference space, helping them adopt and scale large-scale workloads (e.g., frontier models, models, multi-modal systems, optimization) on AWS
  2. Advise customers on AI infrastructure requirements and trade-offs including GPU/Trainium selection, cluster topology, storage, networking (EFA), and cost optimization for training and inference
  3. Provide deep technical guidance on inference optimization model serving architectures (self-managed on EKS, SageMaker endpoints, Sagemaker Hyperpod Serving), batching strategies, quantization, model parallelism, and latency/throughput tradeoffs
  4. Provide deep technical guidance on training optimization distributed training strategies, framework selection (PyTorch, JAX, NeMo), SageMaker HyperPod, Slurm/PCS integration, checkpointing, and data pipeline design
  5. Help customers understand and apply model optimization techniques fine-tuning approaches (LoRA, QLoRA, full fine-tuning), RLHF/DPO, knowledge distillation, and efficient serving techniques (vLLM, TensorRT-LLM, Triton)

Skills

Required

  • Deep understanding of AI infrastructure (GPU, Trainium, networking)
  • Expertise in model optimization for inference and distributed training
  • Experience with model serving architectures (EKS, SageMaker endpoints)
  • Knowledge of distributed training strategies and frameworks (PyTorch, JAX, NeMo)
  • Familiarity with fine-tuning approaches (LoRA, QLoRA, full fine-tuning)
  • Experience with efficient serving techniques (vLLM, TensorRT-LLM, Triton)
  • GPU and accelerator profiling
  • Understanding of hardware layer (GPU architectures, NVLink, EFA networking)
  • Experience with orchestration layer (EKS/Kubernetes, SageMaker HyperPod, Slurm/PCS)
  • Knowledge of framework/model layer (distributed training, inference frameworks)
  • Experience with profiling and debugging tools (NVIDIA Nsight, DCGM, PyTorch Profiler)

Nice to have

  • Experience with RLHF/DPO
  • Experience with SageMaker HyperPod Serving
  • Experience with Slurm/PCS integration
  • Experience with AWS compute, networking, and ML platform services

What the JD emphasized

  • AI infrastructure
  • model training
  • inference optimization
  • large-scale models
  • frontier AI model builders
  • optimization of models
  • inference serving
  • distributed training at scale
  • large-scale workloads
  • frontier models
  • multi-modal systems
  • inference optimization
  • model serving architectures
  • training optimization
  • distributed training strategies
  • model optimization techniques
  • fine-tuning approaches
  • efficient serving techniques
  • deep infrastructure and systems background
  • hands-on ML/AI expertise
  • large-scale training
  • systematic performance tuning

Other signals

  • customer facing
  • infrastructure
  • optimization
  • training
  • inference