Manager, Data Engineering

Walmart Walmart · Retail · Denver, CO

Manager, Data Engineering role at Walmart in Denver, CO. This position supervises six employees and focuses on formulating tech problems, understanding business context, identifying data sources, and transforming/integrating data. Requires experience in ETL, database technologies, Python (PySpark), BI tools, statistical analysis, SQL, data privacy, cloud platforms, and data pipeline orchestration.

What you'd actually do

  1. Tech Problem Formulation Requires knowledge of: Analytics/big data analytics / automation techniques and methods; Business understanding; Precedence and use cases; Business requirements and insights.
  2. Understanding Business Context Requires knowledge of: Industry and environmental factors; Common business vernacular; Business practices across two or more domains such as product, finance, marketing, sales, technology, business systems, and human resources and in-depth knowledge of related practices; Directly relevant business metrics and business areas.
  3. Data Source Identification Requires knowledge of: Functional business domain and scenarios; Categories of data and where it is held; Business data requirements; Database technologies and distributed datastores (e.g. SQL, NoSQL); Data Quality; Existing business systems and processes, including the key drivers and measures of success.
  4. Data Transformation and Integration Requires knowledge of: Internal and external data sources including how they are collected, where and how they are stored, and interrelationships, both within and external to the organization; Techniques like ETL batch processing, streaming ingestion, scrapers, API and crawlers; Data warehousing service for structured and semi-structured data, or to MPP databases such as Snowflake, Microsoft Azure, Presto or Google BigQuery; Pre-processing techniques such as transformation, integration, normalization, feature extraction, to identify and apply appropriate methods; Techniques such as decision trees, advanced regression techniques such as LASSO methods, random forests etc; Cloud and big data environments like EDO2 systems.
  5. This position supervises six employees: Senior Data Scientist (1), Data Analyst II (2), Data Engineer (3).

Skills

Required

  • Experience in the design and development of Extract, Transform, Load (ETL) pipelines to move data from source systems to data warehouses and lakes.
  • Experience with database technologies such as Cassandra, MongoDB, MySQL, PostgreSQL, Redis and cloud data warehouses such as Snowflake, Google Big Query and Redshift.
  • Experience in programming languages like Python (PySpark) for data manipulation and pipeline automation at scale.
  • Experience in creating interactive dashboards and reports using BI tools like Tableau, Power BI and ThoughtSpot.
  • Experience with statistical analysis and tools like R and Python (Pandas and NumPy).
  • Experience with SQL skills for querying large datasets and ensuring data quality.
  • Experience with best practices in data privacy and governance.
  • Experience with cloud platforms such as AWS, Microsoft Azure and Google Cloud Platform for big data storage, computer services, and data pipeline orchestration.
  • Experience with various revenue sources including impression data from ad tech and revenue systems.
  • Experience developing, maintaining, and optimizing data pipelines using modern tech stack (Databricks, Snowflake, and Apache Airflow).
  • Expertise in television, soundbar and user data collected from various sources (Internet of Things (IoT), software, and external sources).

What the JD emphasized

  • Experience in the design and development of Extract, Transform, Load (ETL) pipelines to move data from source systems to data warehouses and lakes.
  • Experience with database technologies such as Cassandra, MongoDB, MySQL, PostgreSQL, Redis and cloud data warehouses such as Snowflake, Google Big Query and Redshift.
  • Experience in programming languages like Python (PySpark) for data manipulation and pipeline automation at scale.
  • Experience with cloud platforms such as AWS, Microsoft Azure and Google Cloud Platform for big data storage, computer services, and data pipeline orchestration.
  • Experience developing, maintaining, and optimizing data pipelines using modern tech stack (Databricks, Snowflake, and Apache Airflow).