Data Engineer, Pxt Central Science

Amazon Amazon · Big Tech · Seattle, WA · Data Science

Data Engineer role focused on building and maintaining data pipelines, feature extraction frameworks, and APIs for productionizing science models within Amazon's People Experience and Technology (PXT) Central Science team. The role involves using AWS services for data processing, model serving, and integration with other engineering teams.

What you'd actually do

  1. Design and maintain scalable data pipelines using native AWS services (Glue, EMR, Lambda); build monitoring and error handling for data workflows; optimize performance, reliability, and cost efficiency
  2. Develop and maintain APIs and data serving layers that productionize science models for downstream consumption; build batch and real-time inference pipelines
  3. Build scalable feature extraction and processing frameworks for diverse data types; develop robust data quality and validation checks; create flexible schemas supporting evolving requirements
  4. Partner with economics, data science, and software engineering teams to translate analytical requirements into production-ready solutions; participate in technical design reviews and architecture discussions
  5. Maintain layered data systems used by economists and scientists; build automated reporting solutions; work across multiple interconnected AWS accounts with security best practices

Skills

Required

  • Knowledge of professional software engineering & best practices for full software development life cycle, including coding standards, software architectures, code reviews, source control management, continuous deployments, testing, and operational excellence
  • 3+ years of data engineering experience
  • Experience in at least one modern scripting or programming language, such as Python, Java, Scala, or NodeJS
  • Experience with data modeling, warehousing and building ETL pipelines
  • Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
  • Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)

Nice to have

  • Experience with big data technologies such as: Hadoop, Hive, Spark, EMR

What the JD emphasized

  • productionize science models
  • data serving layers
  • inference pipelines

Other signals

  • productionize science models
  • develop and maintain APIs and data serving layers
  • build batch and real-time inference pipelines