Data Engineer, Pxt Central Science

Amazon Amazon · Big Tech · Seattle, WA · Data Science

Data Engineer on the PXT Central Science team, responsible for enhancing data architecture, building features, developing end-to-end data engineering solutions for analytical problems, and collaborating with scientists and engineers. Focuses on data pipeline development, model productionization, API development, data integration, and quality checks within an AWS environment.

What you'd actually do

  1. Design and maintain scalable data pipelines using native AWS services (Glue, EMR, Lambda); build monitoring and error handling for data workflows; optimize performance, reliability, and cost efficiency
  2. Develop and maintain APIs and data serving layers that productionize science models for downstream consumption; build batch and real-time inference pipelines
  3. Build scalable feature extraction and processing frameworks for diverse data types; develop robust data quality and validation checks; create flexible schemas supporting evolving requirements
  4. Partner with economics, data science, and software engineering teams to translate analytical requirements into production-ready solutions; participate in technical design reviews and architecture discussions
  5. Maintain layered data systems used by economists and scientists; build automated reporting solutions; work across multiple interconnected AWS accounts with security best practices

Skills

Required

  • professional software engineering best practices
  • full software development life cycle
  • coding standards
  • software architectures
  • code reviews
  • source control management
  • continuous deployments
  • testing
  • operational excellence
  • 3+ years of data engineering experience
  • Python
  • Java
  • Scala
  • NodeJS
  • data modeling
  • data warehousing
  • ETL pipelines
  • AWS Glue
  • AWS EMR
  • AWS Lambda
  • Redshift
  • S3
  • Kinesis
  • FireHose
  • non-relational databases
  • object storage
  • document or key-value stores
  • graph databases
  • column-family databases
  • Bachelor's degree or foreign equivalent in computer science, engineering, mathematics or equivalent

Nice to have

  • Hadoop
  • Hive
  • Spark
  • EMR
  • 4+ years of full software development life cycle experience
  • Bachelor's degree or above in computer science

What the JD emphasized

  • productionize science models
  • inference pipelines
  • feature extraction
  • data quality
  • analytical requirements

Other signals

  • Data Pipeline Development
  • Model Productionization & API Development
  • Data Integration & Quality