Big Data Software Engineer - Python

JPMorgan Chase JPMorgan Chase · Banking · Ciudad Autónoma de Buenos Aires, Argentina · Corporate Sector

Software Engineer III at JPMorgan Chase focused on building a big data platform for Entity Resolution and Relationships within the Commercial & Investment Bank. The role involves acquiring, managing, and transforming data using cloud technologies like AWS and Databricks, with a focus on ETL and data quality.

What you'd actually do

  1. Acquire and manage data from primary and secondary data sources
  2. Identify, analyze, and interpret trends or patterns in complex data sets
  3. Transform existing ETL logic on AWS and Databricks
  4. Innovate new ways of managing, transforming and validating data
  5. Implement new or enhance services and scripts (in both object-oriented and functional programming)

Skills

Required

  • advanced Python programming
  • Pandas
  • NumPy
  • Spark
  • Kafka
  • Databricks
  • ETL transformations
  • AWS services (EC2, EMR, ASG, Lambda, EKS, RDS)
  • API development
  • SQL queries
  • linear algebra
  • statistics
  • algorithms
  • UNIX shell scripting
  • data quality testing
  • relational database environment (Oracle, SQL Server)
  • analytical skills
  • attention to detail
  • accuracy
  • development discipline
  • best practices and standards

Nice to have

  • Data Science
  • Machine Learning
  • AI
  • Financial Services
  • Commercial banking
  • NoSQL platforms (MongoDB, AWS Open Search)

What the JD emphasized

  • extensive experience in utilizing libraries such as Pandas and NumPy
  • Experience in code and infrastructure for Big Data technologies (e.g. Spark, Kafka, Databricks etc.) and implementing complex ETL transformations
  • Experience with AWS services including EC2, EMR, ASG, Lambda, EKS, RDS and others
  • Strong understanding of linear algebra, statistics, and algorithms
  • Strong Experience with UNIX shell scripting to automate file preparation and database loads
  • Experience in data quality testing; adept at writing test cases and scripts, presenting and resolving data issues