Data Engineer III

Walmart Walmart · Retail · Chennai, India

Data Engineer III role focused on building and maintaining data pipelines, data lakes, and consumption layers for Walmart's Market Place team. Requires experience with Big Data technologies (Hadoop, Spark, Hive, Kafka, Airflow), cloud platforms (AWS, Azure, GCP), and SQL/Python/Java.

What you'd actually do

  1. Design and build data pipelines to analyze huge datasets, create powerful visualizations to derive insights and partner with the product team to brainstorm and act on these opportunities.
  2. Build and maintain infrastructure and platform to derive maximize data driven impact.
  3. Develop and implement best-in-class Data pipeline and Data Lake, Consumption/Acceleration layers to ensure best capacity utilization and meet SLAs.
  4. Demonstrate and transform business requirements to code, specific analytical reports and tools.
  5. Design, build, test and deploy cutting edge solutions at scale, impacting multi-billion-dollar business.

Skills

Required

  • Bachelor’s degree in computer science or related discipline with 3+ years’ experience
  • Minimum 2 years of experience in Big Data and distributed computing.
  • Proven experience building pipelines on Big Data Technologies/Stack – Hadoop, Spark, Hive, dataproc, BigQuery, Kafka and Airflow Scheduler to name a few.
  • Deep understanding of the Hadoop ecosystem and strong conceptual knowledge in Hadoop architecture components and experience in working on at least one Big Data technology with Java, Python or Scala.
  • Strong knowledge of deploying and managing applications in AWS or Azure or GCP.
  • Strong scripting skills to process large amount of data and highly proficient in SQL.

Nice to have

  • Solid knowledge of Linux systems with the ability to troubleshoot issues in complex, distributed, multi-tier architectures.
  • Experience in secure, scalable and highly available services.
  • Experience with data science and machine learning is a plus.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and GCP ‘big data’ technologies.
  • Excellent hands-on working knowledge and experience with object-oriented/object function scripting languages: Python, Java, C++, Scala,
  • Good to have experience Micro-services frameworks
  • Good written and verbal communication skills.

What the JD emphasized

  • Proven experience building pipelines on Big Data Technologies/Stack – Hadoop, Spark, Hive, dataproc, BigQuery, Kafka and Airflow Scheduler to name a few.
  • Deep understanding of the Hadoop ecosystem and strong conceptual knowledge in Hadoop architecture components and experience in working on at least one Big Data technology with Java, Python or Scala.
  • Strong knowledge of deploying and managing applications in AWS or Azure or GCP.