Lead Software Engineer - Data Engineer (pyspark) Python, Aws

JPMorgan Chase JPMorgan Chase · Banking · Plano, TX +1 · Consumer & Community Banking

Lead Software Engineer - Data Engineer role focused on architecting, developing, and deploying scalable data pipelines on AWS using services like S3, Redshift, Glue, EMR, Lambda, and Athena. The role involves creating and optimizing data models, building ETL processes, monitoring and tuning data pipelines, and staying current with AWS advancements. Requires proficiency in Python, SQL, and experience with Apache Spark for large-scale data processing and machine learning libraries. Experience with infrastructure automation tools and agile methodologies is also required.

What you'd actually do

  1. Architect, develop, and deploy scalable data pipelines and solutions on AWS using services such as S3, Redshift, Glue, EMR, Lambda, and Athena
  2. Create and optimize data models, build robust ETL processes, and ensure efficient ingestation, transformation, and storage
  3. Monitor, troubleshoot, and tune data pipelines and cloud resources for optimal performance, reliability, and cost efficiency
  4. Maintain comprehensive documentation of data architectures, processes and best practices; mentor junior engineers and share knowledge within the team
  5. Stay current with AWS advancements, evaluate new services and tools, and drive continuous improvement in cloud data engineering practices

Skills

Required

  • software engineering concepts
  • system design
  • application development
  • testing
  • operational stability
  • Java
  • Python
  • SQL
  • Scala
  • Apache Spark
  • AWS CloudFormation
  • Terraform
  • Software Development Life Cycle
  • agile methodologies
  • CI/CD
  • Application Resiliency
  • Security
  • AWS services (Aurora, DynamoDB, S3, RDS)
  • cloud
  • artificial intelligence
  • machine learning
  • mobile

Nice to have

  • Oracle
  • MySQL
  • SQL Server
  • Snowflake
  • Databricks