Senior Machine Learning Engineer, Sponsored Products and Brands Relevance

Amazon Amazon · Big Tech · Palo Alto, CA · Software Development

Senior Machine Learning Engineer responsible for building and owning real-time ML serving systems for ad selection in Sponsored Products and Brands at Amazon. The role involves driving technical direction, designing scalable pipelines, optimizing model performance, and mentoring engineers, operating at massive scale with strict latency SLAs. It touches on deep learning, NLP, LLMs, and distributed systems, impacting shopper experience and advertiser ROI.

What you'd actually do

  1. Drive the technical direction of ML solutions across deep learning, AWS infrastructure, Auto ML, and real-time serving systems
  2. Design, develop, and own scalable offline ML pipelines and online serving components that handle billions of requests per day at millisecond latency
  3. Partner closely with applied scientists to optimize model performance, improve ML productivity, and advance the technical foundation that powers science innovation
  4. Troubleshoot and support high-volume, low-latency distributed systems — what you build is what you own
  5. Mentor junior engineers and guide them to deliver high-impact products and services for Amazon customers and sellers

Skills

Required

  • 8+ years of non-internship professional software development experience
  • 10+ years of programming with at least one software programming language experience
  • 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • Experience as a mentor, tech lead or leading an engineering team
  • Knowledge of Machine Learning and LLM fundamentals, including transformer architecture, training/inference lifecycles, and optimization techniques
  • 5+ years of building large-scale machine-learning infrastructure for online recommendation, ads ranking, personalization or search experience
  • Demonstrated ability to drive technical decisions across teams and deliver end-to-end from design through production deployment

Nice to have

  • 10+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Master's degree in computer science or equivalent
  • Experience with model serving infrastructure (SageMaker, Triton, vLLM, or equivalent)
  • Experience with LLM/GenAI systems in production — such as prompt engineering, fine-tuning, retrieval-augmented generation, or agentic workflows
  • Hands-on experience with ML frameworks such as PyTorch or TensorFlow
  • Experience with large-scale data processing using Spark

What the JD emphasized

  • billions of daily requests
  • millisecond latency
  • real-time ML serving systems
  • low-latency distributed systems

Other signals

  • billions of daily requests
  • millisecond latency
  • massive scale
  • real-time ML serving systems
  • deep learning
  • LLMs