Machine Learning Engineer , Amazon Customer Service

Amazon Amazon · Big Tech · CA, BC +1 · Software Development

Machine Learning Engineer on the Data Intelligence team within Amazon Customer Service, responsible for designing and building scalable AI/ML systems, end-to-end AI pipelines, and production-grade AI services including generative AI, LLMs, and intelligent agent systems. The role involves building infrastructure for the complete AI model lifecycle, handling high-volume inference, implementing AI governance, and developing AI-powered products.

What you'd actually do

  1. Design and implement enterprise-scale AI/ML pipelines and model serving infrastructure that ensure optimal performance, reliability, and low-latency inference for both traditional ML models and generative AI systems.
  2. Architect and build AI platform infrastructure that supports the complete model lifecycle, from training environments, feature stores, and validation frameworks to production deployment, A/B testing, and monitoring systems.
  3. Develop and deploy generative AI solutions, including LLM-based applications, retrieval-augmented generation (RAG) systems, AI agents, and intelligent automation workflows.
  4. Build and optimize AI model serving systems for production use, including model compression, quantization, prompt engineering pipelines, and efficient serving strategies to meet latency and throughput requirements.
  5. Develop and maintain robust AI governance frameworks, implementing security controls, guardrails, responsible AI practices, and compliant data access patterns that protect sensitive information.

Skills

Required

  • 3+ years of contributing to new and current systems architecture and design (architecture, design patterns, reliability and scaling)
  • Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
  • Experience in machine learning, data mining, information retrieval, statistics or natural language processing

Nice to have

  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Master's degree in computer science or equivalent
  • Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware

What the JD emphasized

  • enterprise-scale AI/ML systems
  • high-volume inference workloads
  • AI governance frameworks
  • scalable AI-powered products
  • low-latency inference
  • AI platform infrastructure
  • generative AI solutions
  • AI agents
  • AI model serving systems
  • AI governance frameworks
  • responsible AI practices
  • AI/ML services
  • AI/ML engineering methodologies
  • AI research
  • AI platforms
  • AI system design

Other signals

  • design and build robust, scalable AI/ML systems and infrastructure
  • architect end-to-end AI pipelines for model training, evaluation, and deployment
  • develop production-grade AI services including generative AI, large language models (LLMs), and intelligent agent systems
  • build infrastructure that supports the complete lifecycle of AI models
  • enterprise-scale AI/ML systems that handle high-volume inference workloads
  • implement comprehensive model and AI governance frameworks
  • build scalable AI-powered products that power critical business capabilities