Data Engineer Ii, Dash Device Operations

Amazon Amazon · Big Tech · DIF, Mexico +1 · Software Development

Data Engineer II role focused on building and operating large-scale data infrastructure to support AI/ML workloads, intelligent automation, and agentic systems. The role involves designing and implementing data pipelines, ETL/ELT processes, and AI-ready data foundations, with a strong emphasis on enabling data for machine learning models and agents. The candidate will collaborate with scientists and engineers to ensure data infrastructure meets the needs of ML training, inference, and agentic automation.

What you'd actually do

  1. Design, implement, and operate scalable data pipelines (batch and real-time) that serve analytics, reporting, and AI/ML workloads
  2. Build and maintain data infrastructure that supports AI-ready datasets — structured for consumption by machine learning models, agents, and natural language interfaces
  3. Interface with technology teams to extract, transform, and load data from diverse sources using SQL, Python, and distributed computing frameworks
  4. Implement data models and ETL/ELT processes using best practices in dimensional modeling, data vault, or hybrid approaches on MPP data warehouses
  5. Build robust data integration pipelines using SQL, Python, and Spark across batch and streaming paradigms

Skills

Required

  • Bachelor's degree in a quantitative/technical field such as computer science, engineering, statistics
  • 4+ years of data engineering, database engineering, business intelligence or business analytics experience
  • Experience in writing complex, highly-optimized SQL queries across large datasets
  • 4+ years of development/programming/scripting language (Python/Java/Bash/Perl) experience
  • Experience in data warehouse technical architectures, data modeling, infrastructure components, ETL/ ELT and reporting/analytic tools and environments, data structures and hands-on SQL coding
  • Experience in Redshift, or experience in Hive/Spark/Hbase/Yarn and experience in Kafka
  • Experience with AWS services including S3, Redshift, Sagemaker, EMR, Kinesis, Lambda, and EC2
  • Knowledge of distributed systems as it pertains to data storage and computing

Nice to have

  • Master's degree in engineering, statistics, computer science, mathematics, or a related quantitative field
  • Experience with data infrastructures: relational analytic DBMS, Elastic-Search, and Big Data EMR/EC2/Glue/Lambda, or experience with training and deploying machine learning systems to solve large-scale optimizations
  • Experience with infrastructure as code, ops automation, and configuration management tools such as Chef, Puppet, or Ansible
  • Experience communicating with users, other technical teams, and management to collect requirements, describe data modeling decisions and data engineering strategy
  • Experience as a mentor, tech lead or leading an engineering team, or experience debugging, profiling, and implementing best software engineering practices in large-scale systems
  • Knowledge of software engineering best practices across the development life cycle, including agile methodologies, coding standards, code reviews, source management, build processes, testing, and operations

What the JD emphasized

  • large-scale, high-performance data infrastructure
  • AI/ML workloads
  • intelligent automation
  • agentic systems
  • AI-ready data foundations
  • machine learning models
  • agents
  • ML training
  • inference
  • agentic automation systems
  • large volumes of data
  • highly complex technical contexts
  • data-driven decisions at scale
  • data modeling
  • ETL design
  • data warehousing
  • data engineering and AI/ML
  • well-structured data infrastructure
  • intelligent systems
  • fast-paced, collaborative environment
  • scalable data pipelines
  • AI-ready datasets
  • diverse sources
  • dimensional modeling
  • data vault
  • hybrid approaches
  • MPP data warehouses
  • batch and streaming paradigms
  • business analysis
  • customer reporting
  • AI/ML feature engineering
  • scientists and application engineers
  • ML training
  • inference
  • agentic automation systems
  • business customers
  • data solutions
  • dataset designs
  • pipeline architectures
  • tooling
  • dataset documentation
  • metadata
  • data lineage artifacts
  • junior data engineers
  • data engineering
  • code quality
  • operational excellence
  • data pipelines
  • operational systems
  • warehouse schemas
  • thousands of daily queries
  • infrastructure for cost and performance
  • AI/ML capabilities
  • accessible, reliable, and well-governed
  • scientists, BI engineers, and application developers
  • traditional analytics
  • emerging intelligent systems
  • SQL optimization
  • real-time CDC pipelines
  • agent to query data programmatically
  • scalable data platforms
  • analytical frameworks
  • AI-powered solutions
  • reporting infrastructure
  • Device Operations & Supply Chain
  • data engineering
  • business intelligence
  • science capabilities
  • operational decision-making
  • data infrastructure
  • AI innovation
  • human analysts
  • intelligent agents
  • Device Ops users
  • quantitative/technical field
  • computer science
  • engineering
  • statistics
  • 4+ years of data engineering
  • database engineering
  • business intelligence
  • business analytics experience
  • complex, highly-optimized SQL queries
  • large datasets
  • 4+ years of development/programming/scripting language
  • Python/Java/Bash/Perl
  • data warehouse technical architectures
  • data modeling
  • infrastructure components
  • ETL/ ELT
  • reporting/analytic tools and environments
  • data structures
  • hands-on SQL coding
  • Redshift
  • Hive/Spark/Hbase/Yarn
  • Kafka
  • AWS services
  • S3
  • Redshift
  • Sagemaker
  • EMR
  • Kinesis
  • Lambda
  • EC2
  • distributed systems
  • data storage and computing
  • Master's degree
  • engineering
  • statistics
  • computer science
  • mathematics
  • quantitative field
  • data infrastructures
  • relational analytic DBMS
  • Elastic-Search
  • Big Data EMR/EC2/Glue/Lambda
  • training and deploying machine learning systems
  • large-scale optimizations
  • infrastructure as code
  • ops automation
  • configuration management tools
  • Chef
  • Puppet
  • Ansible
  • communicating with users
  • other technical teams
  • management
  • collect requirements
  • describe data modeling decisions
  • data engineering strategy
  • mentor
  • tech lead
  • leading an engineering team
  • debugging
  • profiling
  • implementing best software engineering practices
  • large-scale systems
  • software engineering best practices
  • development life cycle
  • agile methodologies
  • coding standards
  • code reviews
  • source management
  • build processes
  • testing
  • operations

Other signals

  • design, build, and operate large-scale, high-performance data infrastructure that powers analytics, AI/ML workloads, and intelligent automation
  • build real-time and batch pipelines
  • enable AI-ready data foundations that support both traditional BI and emerging agentic systems
  • partner with scientists and application engineers to ensure data infrastructure meets the needs of ML training, inference, and agentic automation systems