Staff Data Engineer (8+ Years, Hadoop, Hive, Scala, Python)

Visa Visa · Fintech · Bengaluru, India, IN

Staff Data Engineer with 8+ years of experience in building and managing enterprise-scale data engineering pipelines using Hadoop, Hive, and Spark. Responsibilities include designing, implementing, and testing scalable distributed systems, leading code reviews, and collaborating with product managers. Requires strong experience in Scala, Python, or Java, advanced SQL, Linux scripting, and familiarity with scheduling tools like Airflow. Experience with AWS and streaming technologies is a plus.

What you'd actually do

  1. Strong technology and leadership background building and managing enterprise scale Data Engineering pipelines (Hadoop, Hive, Spark)
  2. Should posses a solid understanding of data engineering principles and best practices
  3. Responsible for the design, implementation and test of scalable distributed systems that take advantage of technology to allow standardization, security, timeliness and quality of data.
  4. Lead code reviews, ensuring adherence to coding standards, and promoting clean, efficient code within the team.
  5. Experience creating/supporting production software/systems and a proven track record of identifying and resolving performance bottlenecks for production systems.

Skills

Required

  • 8+ years of work experience with a Bachelors Degree or with an Advanced Degree
  • Hadoop ecosystem and associated technologies (Apache Spark, etc.)
  • Writing and optimizing spark code and Hive code
  • Advanced SQL for extracting, aggregating, and processing big data using Hadoop
  • Scala, Python or Java
  • Linux systems with Unix/Shell or Python scripting
  • Scheduling tools like Airflow and Control – M

Nice to have

  • Continuous Integration and Automated Test tools such as Jenkins, Artifactory, Git, Selenium, Chef
  • RDBMs viz, MS SQL, DB2, Oracle, etc for data retrieval
  • Streaming technologies like Kafka, Spark streaming, etc with Apache Hudi
  • AWS (other cloud) services including EC2, S3, SageMaker etc.
  • public cloud equivalents, and ecosystem
  • Visualization Tools like Tableau, Power BI