Data Engineer

Meta Meta · Big Tech · Menlo Park, CA

Meta is seeking a Data Engineer to design, build, and launch data pipelines and tools that generate business insights. The role involves analyzing user needs, designing software and data solutions for data-driven decisions, and influencing strategy for efficient data solutions and scalable data warehouse plans. Responsibilities include developing new data models and processes, leveraging ETL frameworks, and collaborating with infrastructure, product, and engineering teams. The role also involves identifying and resolving data infrastructure issues.

What you'd actually do

  1. Design, build, and launch data pipelines to move data across systems and build the next generation of data tools that generate business insights for a product.
  2. Analyze user needs and software requirements to determine workability and to offer support for end users on data usage.
  3. Design, architect, and develop software and data solutions that help product and business teams make data-driven decisions.
  4. Rethink and influence strategy and roadmap for building efficient data solutions and scalable data warehouse plans.
  5. Design, develop, test, and launch new data models and processes into production, and provide support.

Skills

Required

  • Data ETL (Extract, Transform, Load) design, implementation, and maintenance on a large scale
  • Data visualization via Tableau, R, or Python
  • Programming in Hack, C/C++, Python, Perl, Java, or PHP
  • Internet technologies: HTTP, HTML, CSS, or JavaScript
  • Writing and optimizing SQL statements
  • Analyzing large volumes of data to provide data driven insights, gaps, and inconsistencies
  • Data governance standard and data privacy compliance
  • Data processing automation
  • Data warehousing architecture and plans
  • Informatica, Talend, Pentaho, dimensional data modeling, or schema design
  • Map Reduce or MPP system
  • Machine Learning and Artificial Intelligence fundamentals
  • Statistics methods: descriptive statistics, hypothesis testing, and regression analysis
  • Distributed processing technologies and frameworks, such as Hadoop, and distributed storage systems (e.g., HDFS, S3)
  • Spark programming: code writing, debugging and optimization