Data Engineer, Seller Partner Trust and Store Integrity Science

Amazon Amazon · Big Tech · Seattle, WA · Data Science

Data Engineer role focused on building and maintaining scalable data infrastructure and pipelines to support ML model training and inference for fraud prevention in e-commerce. The role involves processing large volumes of data, optimizing ETL processes, ensuring data integrity, and collaborating with scientists to productionize ML models, with a focus on low latency and high reliability for inference.

What you'd actually do

  1. Design, build, and maintain scalable data pipelines that support multiple ML model training and inference workflows
  2. Develop and optimize ETL processes to ingest, transform, and prepare terabytes of data from diverse sources for model consumption
  3. Implement robust data quality checks and monitoring systems to ensure data integrity across all pipelines
  4. Build and maintain infrastructure for model training pipelines, including feature engineering, data versioning, and experiment tracking
  5. Design and implement scalable inference pipelines that serve predictions for millions of transactions with low latency and high reliability

Skills

Required

  • 3+ years of data engineering experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using ETL/ELT processes experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using OLAP technologies experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using data modeling experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using SQL experience
  • 1+ years of developing and operating large-scale data structures for business intelligence analytics using Oracle experience
  • Experience with data modeling, warehousing and building ETL pipelines

Nice to have

  • Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
  • Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)

What the JD emphasized

  • scalable data infrastructure and pipelines
  • terabytes of data
  • ML model training and inference workflows
  • low latency and high reliability

Other signals

  • building scalable data infrastructure and pipelines
  • process terabytes of data
  • enabling state-of-the-art algorithms
  • own end-to-end data systems
  • directly impact the team's ability to deliver insights and models
  • manage the safety of millions of transactions
  • scaling up our operations with automation
  • empower scientists to develop advanced machine learning systems
  • productionize ML models
  • translating research code into production-ready systems
  • low latency and high reliability