Data Engineer Ii, Dash Device Operations

Amazon · Big Tech · DIF, Mexico +1 · Software Development

Data Engineer II role focused on building and operating large-scale data infrastructure to support AI/ML workloads, intelligent automation, and agentic systems. The role involves designing and implementing data pipelines, ETL/ELT processes, and AI-ready data foundations, with a strong emphasis on enabling data for machine learning models and agents. The candidate will collaborate with scientists and engineers to ensure data infrastructure meets the needs of ML training, inference, and agentic automation.

What you'd actually do

Design, implement, and operate scalable data pipelines (batch and real-time) that serve analytics, reporting, and AI/ML workloads
Build and maintain data infrastructure that supports AI-ready datasets — structured for consumption by machine learning models, agents, and natural language interfaces
Interface with technology teams to extract, transform, and load data from diverse sources using SQL, Python, and distributed computing frameworks
Implement data models and ETL/ELT processes using best practices in dimensional modeling, data vault, or hybrid approaches on MPP data warehouses
Build robust data integration pipelines using SQL, Python, and Spark across batch and streaming paradigms

Skills

Required

Bachelor's degree in a quantitative/technical field such as computer science, engineering, statistics
4+ years of data engineering, database engineering, business intelligence or business analytics experience
Experience in writing complex, highly-optimized SQL queries across large datasets
4+ years of development/programming/scripting language (Python/Java/Bash/Perl) experience
Experience in data warehouse technical architectures, data modeling, infrastructure components, ETL/ ELT and reporting/analytic tools and environments, data structures and hands-on SQL coding
Experience in Redshift, or experience in Hive/Spark/Hbase/Yarn and experience in Kafka
Experience with AWS services including S3, Redshift, Sagemaker, EMR, Kinesis, Lambda, and EC2
Knowledge of distributed systems as it pertains to data storage and computing

Nice to have

Master's degree in engineering, statistics, computer science, mathematics, or a related quantitative field
Experience with data infrastructures: relational analytic DBMS, Elastic-Search, and Big Data EMR/EC2/Glue/Lambda, or experience with training and deploying machine learning systems to solve large-scale optimizations
Experience with infrastructure as code, ops automation, and configuration management tools such as Chef, Puppet, or Ansible
Experience communicating with users, other technical teams, and management to collect requirements, describe data modeling decisions and data engineering strategy
Experience as a mentor, tech lead or leading an engineering team, or experience debugging, profiling, and implementing best software engineering practices in large-scale systems
Knowledge of software engineering best practices across the development life cycle, including agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations

What the JD emphasized

large-scale, high-performance data infrastructure
AI/ML workloads
intelligent automation
agentic systems
AI-ready data foundations
machine learning models
agents
ML training
inference
agentic automation systems
large volumes of data
highly complex technical contexts
data-driven decisions at scale
data modeling
ETL design
data warehousing
data engineering and AI/ML
well-structured data infrastructure
intelligent systems
fast-paced, collaborative environment
scalable data pipelines
AI-ready datasets
diverse sources
dimensional modeling
data vault
hybrid approaches
MPP data warehouses
batch and streaming paradigms
business analysis
customer reporting
AI/ML feature engineering
scientists and application engineers
ML training
inference
agentic automation systems
business customers
data solutions
dataset designs
pipeline architectures
tooling
dataset documentation
metadata
data lineage artifacts
junior data engineers
data engineering
code quality
operational excellence
data pipelines
operational systems
warehouse schemas
thousands of daily queries
infrastructure for cost and performance
AI/ML capabilities
accessible, reliable, and well-governed
scientists, BI engineers, and application developers
traditional analytics
emerging intelligent systems
SQL optimization
real-time CDC pipelines
agent to query data programmatically
scalable data platforms
analytical frameworks
AI-powered solutions
reporting infrastructure
Device Operations & Supply Chain
data engineering
business intelligence
science capabilities
operational decision-making
data infrastructure
AI innovation
human analysts
intelligent agents
Device Ops users
quantitative/technical field
computer science
engineering
statistics
4+ years of data engineering
database engineering
business intelligence
business analytics experience
complex, highly-optimized SQL queries
large datasets
4+ years of development/programming/scripting language
Python/Java/Bash/Perl
data warehouse technical architectures
data modeling
infrastructure components
ETL/ ELT
reporting/analytic tools and environments
data structures
hands-on SQL coding
Redshift
Hive/Spark/Hbase/Yarn
Kafka
AWS services
S3
Redshift
Sagemaker
EMR
Kinesis
Lambda
EC2
distributed systems
data storage and computing
Master's degree
engineering
statistics
computer science
mathematics
quantitative field
data infrastructures
relational analytic DBMS
Elastic-Search
Big Data EMR/EC2/Glue/Lambda
training and deploying machine learning systems
large-scale optimizations
infrastructure as code
ops automation
configuration management tools
Chef
Puppet
Ansible
communicating with users
other technical teams
management
collect requirements
describe data modeling decisions
data engineering strategy
mentor
tech lead
leading an engineering team
debugging
profiling
implementing best software engineering practices
large-scale systems
software engineering best practices
development life cycle
agile methodologies
coding standards
code reviews
source management
build processes
testing
operations

Other signals

design, build, and operate large-scale, high-performance data infrastructure that powers analytics, AI/ML workloads, and intelligent automation
build real-time and batch pipelines
enable AI-ready data foundations that support both traditional BI and emerging agentic systems
partner with scientists and application engineers to ensure data infrastructure meets the needs of ML training, inference, and agentic automation systems

Read full job description

We are seeking a talented, self-directed Data Engineer to design, build, and operate large-scale, high-performance data infrastructure that powers analytics, AI/ML workloads, and intelligent automation across Device Operations. You will implement data structures using best practices in data modeling and ETL/ELT processes, build real-time and batch pipelines, and enable AI-ready data foundations that support both traditional BI and emerging agentic systems. You will gather business and functional requirements and translate them into robust, scalable solutions that work within the broader data architecture. You will analyze source systems, drive best practices with partner teams, and participate in the full development lifecycle — from design and implementation to documentation, delivery, and operational support.

The ideal candidate relishes working with large volumes of data, enjoys the challenge of highly complex technical contexts, and is passionate about enabling data-driven decisions at scale. They are an expert in data modeling, ETL design, and data warehousing — and are energized by the intersection of data engineering and AI/ML, where well-structured data infrastructure creates an outsized impact on intelligent systems. They are a self-starter, comfortable with ambiguity, able to think big while paying careful attention to detail, and thrive in a fast-paced, collaborative environment.

Key job responsibilities Design, implement, and operate scalable data pipelines (batch and real-time) that serve analytics, reporting, and AI/ML workloads

Build and maintain data infrastructure that supports AI-ready datasets — structured for consumption by machine learning models, agents, and natural language interfaces

Interface with technology teams to extract, transform, and load data from diverse sources using SQL, Python, and distributed computing frameworks

Implement data models and ETL/ELT processes using best practices in dimensional modeling, data vault, or hybrid approaches on MPP data warehouses

Build robust data integration pipelines using SQL, Python, and Spark across batch and streaming paradigms

Design and deliver high-quality datasets that support business analysis, customer reporting, and AI/ML feature engineering

Partner with scientists and application engineers to ensure data infrastructure meets the needs of ML training, inference, and agentic automation systems

Interface with business customers, gather requirements, and deliver complete, well-documented data solutions

Evaluate and make decisions around dataset designs, pipeline architectures, and tooling proposed by peer engineers

Produce comprehensive dataset documentation, metadata, and data lineage artifacts

Mentor junior data engineers on best practices in data engineering, code quality, and operational excellence

A day in the life You will work across the full spectrum of data engineering — building pipelines that ingest from operational systems, designing warehouse schemas that serve thousands of daily queries, optimizing infrastructure for cost and performance, and enabling new AI/ML capabilities by making data accessible, reliable, and well-governed. You will collaborate with scientists, BI engineers, and application developers to solve problems that span traditional analytics and emerging intelligent systems. Some days you will be deep in SQL optimization; other days you will be designing real-time CDC pipelines or enabling a new agent to query data programmatically.

About the team The Data, Analytics, and Science Hub (DASH) team builds scalable data platforms, analytical frameworks, AI-powered solutions, and reporting infrastructure to support Device Operations & Supply Chain. DASH serves multiple organizations across DeviceOps — delivering data engineering, business intelligence, and science capabilities that power operational decision-making. The team operates at the intersection of data infrastructure and AI innovation, building systems that serve both human analysts and intelligent agents supporting Device Ops users.

Basic Qualifications

Bachelor's degree in a quantitative/technical field such as computer science, engineering, statistics
4+ years of data engineering, database engineering, business intelligence or business analytics experience
Experience in writing complex, highly-optimized SQL queries across large datasets
4+ years of development/programming/scripting language (Python/Java/Bash/Perl) experience
Experience in data warehouse technical architectures, data modeling, infrastructure components, ETL/ ELT and reporting/analytic tools and environments, data structures and hands-on SQL coding
Experience in Redshift, or experience in Hive/Spark/Hbase/Yarn and experience in Kafka
Experience with AWS services including S3, Redshift, Sagemaker, EMR, Kinesis, Lambda, and EC2
Knowledge of distributed systems as it pertains to data storage and computing

Preferred Qualifications

Master's degree in engineering, statistics, computer science, mathematics, or a related quantitative field
Experience with data infrastructures: relational analytic DBMS, Elastic-Search, and Big Data EMR/EC2/Glue/Lambda, or experience with training and deploying machine learning systems to solve large-scale optimizations
Experience with infrastructure as code, ops automation, and configuration management tools such as Chef, Puppet, or Ansible
Experience communicating with users, other technical teams, and management to collect requirements, describe data modeling decisions and data engineering strategy
Experience as a mentor, tech lead or leading an engineering team, or experience debugging, profiling, and implementing best software engineering practices in large-scale systems
Knowledge of software engineering best practices across the development life cycle, including agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.