What you'd actually do

Design and implement enterprise-scale AI/ML pipelines and model serving infrastructure that ensure optimal performance, reliability, and low-latency inference for both traditional ML models and generative AI systems.

Architect and build AI platform infrastructure that supports the complete model lifecycle, from training environments, feature stores, and validation frameworks to production deployment, A/B testing, and monitoring systems.

Develop and deploy generative AI solutions, including LLM-based applications, retrieval-augmented generation (RAG) systems, AI agents, and intelligent automation workflows.

Build and optimize AI model serving systems for production use, including model compression, quantization, prompt engineering pipelines, and efficient serving strategies to meet latency and throughput requirements.

Develop and maintain robust AI governance frameworks, implementing security controls, guardrails, responsible AI practices, and compliant data access patterns that protect sensitive information.

Skills

Required

3+ years of contributing to new and current systems architecture and design (architecture, design patterns, reliability and scaling)
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
Experience in machine learning, data mining, information retrieval, statistics or natural language processing

Nice to have

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Master's degree in computer science or equivalent
Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware

What the JD emphasized

enterprise-scale AI/ML systems

high-volume inference workloads

AI governance frameworks

scalable AI-powered products

low-latency inference

AI platform infrastructure

generative AI solutions

AI agents

AI model serving systems

AI governance frameworks

responsible AI practices

AI/ML services

AI/ML engineering methodologies

AI research

AI platforms

AI system design

Other signals

design and build robust, scalable AI/ML systems and infrastructure

architect end-to-end AI pipelines for model training, evaluation, and deployment

develop production-grade AI services including generative AI, large language models (LLMs), and intelligent agent systems

build infrastructure that supports the complete lifecycle of AI models

enterprise-scale AI/ML systems that handle high-volume inference workloads

implement comprehensive model and AI governance frameworks

build scalable AI-powered products that power critical business capabilities

We are looking for a Machine Learning Engineer on the Data Intelligence team which is part of Amazon Customer Service (CS) team, you will design and build robust, scalable AI/ML systems and infrastructure. You'll architect end-to-end AI pipelines for model training, evaluation, and deployment, implement secure and efficient data processing solutions, and develop production-grade AI services including generative AI, large language models (LLMs), and intelligent agent systems. Additionally, you'll build infrastructure that supports the complete lifecycle of AI models - from experimentation and development to production deployment and monitoring.

You'll work with cross-functional teams (e.g., scientists, product managers, data engineers) to create enterprise-scale AI/ML systems that handle high-volume inference workloads, implement comprehensive model and AI governance frameworks, and build scalable AI-powered products that power critical business capabilities.

If you enjoy solving complex AI and machine learning challenges in high-scale environments, working in a collaborative and dynamic team, and want to make a lasting impact on Amazon Customer Service worldwide, this is your opportunity. Come join us on this exciting journey!

Key job responsibilities

Design and implement enterprise-scale AI/ML pipelines and model serving infrastructure that ensure optimal performance, reliability, and low-latency inference for both traditional ML models and generative AI systems.
Architect and build AI platform infrastructure that supports the complete model lifecycle, from training environments, feature stores, and validation frameworks to production deployment, A/B testing, and monitoring systems.
Develop and deploy generative AI solutions, including LLM-based applications, retrieval-augmented generation (RAG) systems, AI agents, and intelligent automation workflows.
Build and optimize AI model serving systems for production use, including model compression, quantization, prompt engineering pipelines, and efficient serving strategies to meet latency and throughput requirements.
Develop and maintain robust AI governance frameworks, implementing security controls, guardrails, responsible AI practices, and compliant data access patterns that protect sensitive information.
Drive technical architecture decisions and system design, focusing on scalability, reliability, and performance of distributed AI/ML services while ensuring alignment with business requirements.
Own end-to-end delivery of AI/ML solutions, including design, implementation, experimentation, and verification of components, using standard software engineering and AI/ML engineering methodologies and best practices.
Collaborate with cross-functional teams, including Product Managers, Applied Scientists, and Data Engineers, to understand requirements, conduct design reviews, and ensure successful delivery of AI solutions while maintaining high development standards.

A day in the life A typical day as a Machine Learning Engineer involves architecting and building robust AI/ML infrastructure and intelligent systems that power critical AI initiatives. Your morning might start with reviewing model performance metrics and experiment results, collaborating with Applied Scientists to optimize LLM prompting strategies or model architectures, or working with Product Managers to plan AI product features.

Throughout the day, you'll write and review code for AI/ML pipelines, generative AI applications, and model serving systems, while monitoring and optimizing existing AI services for performance, accuracy, and reliability. You'll often find yourself diving deep into model behavior issues, implementing guardrails for responsible AI deployment, improving inference latency and throughput, and building new capabilities into our AI platforms. Cross-team collaboration is key, as you work closely with scientists to translate innovative AI research into production-ready systems and consult with data engineers to ensure high-quality feature and knowledge pipelines. As a senior member of the team, you'll also mentor junior engineers, sharing your expertise in AI system design and best practices.

About the team The Data Intelligence team is a new function within Customer Engagement Technology. We own the end-to-end process of defining, building, implementing, and monitoring a comprehensive data and AI strategy. We also develop and apply Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), Computer Vision, ML, Knowledge Graphs, and Natural Language Processing (NLP) to customer service associate experiences and foundational technologies.

Basic Qualifications

3+ years of contributing to new and current systems architecture and design (architecture, design patterns, reliability and scaling) experience
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
Experience in machine learning, data mining, information retrieval, statistics or natural language processing

Preferred Qualifications

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Master's degree in computer science or equivalent
Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

The base salary range for this position is listed below. As a total compensation company, Amazon's package may include other elements such as sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon offers comprehensive benefits including health insurance (medical, dental, vision, prescription, basic life & AD&D insurance), Registered Retirement Savings Plan (RRSP), Deferred Profit Sharing Plan (DPSP), paid time off, and other resources to improve health and well-being. We thank all applicants for their interest, however only those interviewed will be advised as to hiring status.

CAN, BC, Vancouver - 114,800.00 - 191,800.00 CAD annually

Key job responsibilities

Design and implement enterprise-scale AI/ML pipelines and model serving infrastructure that ensure optimal performance, reliability, and low-latency inference for both traditional ML models and generative AI systems.
Architect and build AI platform infrastructure that supports the complete model lifecycle, from training environments, feature stores, and validation frameworks to production deployment, A/B testing, and monitoring systems.
Develop and deploy generative AI solutions, including LLM-based applications, retrieval-augmented generation (RAG) systems, AI agents, and intelligent automation workflows.
Build and optimize AI model serving systems for production use, including model compression, quantization, prompt engineering pipelines, and efficient serving strategies to meet latency and throughput requirements.
Develop and maintain robust AI governance frameworks, implementing security controls, guardrails, responsible AI practices, and compliant data access patterns that protect sensitive information.
Drive technical architecture decisions and system design, focusing on scalability, reliability, and performance of distributed AI/ML services while ensuring alignment with business requirements.
Own end-to-end delivery of AI/ML solutions, including design, implementation, experimentation, and verification of components, using standard software engineering and AI/ML engineering methodologies and best practices.
Collaborate with cross-functional teams, including Product Managers, Applied Scientists, and Data Engineers, to understand requirements, conduct design reviews, and ensure successful delivery of AI solutions while maintaining high development standards.

Basic Qualifications

3+ years of contributing to new and current systems architecture and design (architecture, design patterns, reliability and scaling) experience
Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution
Experience in machine learning, data mining, information retrieval, statistics or natural language processing

Preferred Qualifications

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
Master's degree in computer science or equivalent
Experience in developing and deploying LLMs in production on GPUs, Neuron, TPU or other AI acceleration hardware

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

CAN, BC, Vancouver - 114,800.00 - 191,800.00 CAD annually