Staff Data Engineer

Dropbox · Enterprise · Mexico · CTO-Data Science, AI Platform & Eng (Sub Team)

Staff Data Engineer to join the Analytics Data Engineering (ADE) team within Data Science & AI Platform. Responsible for solving cross-cutting data challenges, driving standardization in analytics pipelines, modernizing the analytics platform, and laying the foundation for AI-native data development. Will partner closely with Data Science, Data Infrastructure, Product Engineering, and Business Intelligence teams. Key responsibilities include designing and implementing shared data models, driving standardization of data engineering practices, architecting shift-left data governance, and evaluating/integrating AI-native tooling.

What you'd actually do

Lead the design and implementation of shared, reusable data models, defining shared fact tables, conformed dimensions, and a semantic/metrics layer that serves as the single source of truth across analytics functions
Drive standardization of data engineering practices across ADE and functional analytics teams, including pipeline patterns, CI/CD workflows, naming conventions, and data modeling standards
Partner with Data Infrastructure to modernize orchestration, improve pipeline decomposition, and establish secure dev/test environments with production data access
Architect and implement a shift-left data governance strategy, working with upstream data producers to establish data contracts, SLOs, and code-enforced quality gates that catch issues before production
Collaborate with Data Science leads and Product Management to translate metric definitions into reliable, certified data pipelines that power executive dashboards, WBR reporting, and growth measurement

Skills

Required

BS degree in Computer Science or related technical field, or equivalent technical experience
12+ years of experience in data engineering or analytics engineering with increasing scope and technical leadership
12+ years of SQL experience, including complex analytical queries, window functions, and performance optimization at scale (Spark SQL)
8+ years of Python development experience, including building and maintaining production data pipelines
Deep expertise in dimensional data modeling, schema design, and scalable data architecture, with hands-on experience building shared data models across multiple business domains
Strong experience with orchestration tools (Airflow strongly preferred) and dbt, including pipeline design, scheduling strategies, and failure recovery patterns
Demonstrated ability to drive cross-team technical alignment, establishing standards, influencing without authority, and working across Data Engineering, Data Science, Data Infrastructure, and Product Engineering boundaries

Nice to have

Experience with Databricks (Unity Catalog, Delta Lake) and modern lakehouse architectures
Experience leading orchestration or platform modernization efforts at scale
Familiarity with data governance and observability tools such as Atlan, Monte Carlo, Great Expectations, or similar
Experience building or contributing to a metrics/semantic layer (dbt MetricFlow, Databricks Metric Views, or equivalent)
Track record of establishing data engineering standards and best practices in a federated analytics organization

What the JD emphasized

12+ years of experience in data engineering or analytics engineering with increasing scope and technical leadership
12+ years of SQL experience, including complex analytical queries, window functions, and performance optimization at scale (Spark SQL)
8+ years of Python development experience, including building and maintaining production data pipelines
Deep expertise in dimensional data modeling, schema design, and scalable data architecture, with hands-on experience building shared data models across multiple business domains
Strong experience with orchestration tools (Airflow strongly preferred) and dbt, including pipeline design, scheduling strategies, and failure recovery patterns
Demonstrated ability to drive cross-team technical alignment, establishing standards, influencing without authority, and working across Data Engineering, Data Science, Data Infrastructure, and Product Engineering boundaries

Other signals

modernizing our analytics platform
building shared and reusable data models
establishing a certified metrics framework
laying the foundation for AI-native data development
Evaluate and integrate AI-native tooling into the data development lifecycle

Read full job description

Role Description

Dropbox is looking for a Staff Data Engineer to join our Analytics Data Engineering (ADE) team within Data Science & AI Platform. You will be responsible for solving cross-cutting data challenges that span multiple lines of business while driving standardization in how we build, deploy, and govern analytics pipelines across Dropbox.

This is not a maintenance role. We are modernizing our analytics platform, upgrading orchestration infrastructure, building shared and reusable data models with conformed dimensions, establishing a certified metrics framework, and laying the foundation for AI-native data development. You will partner closely with Data Science, Data Infrastructure, Product Engineering, and Business Intelligence teams to make this happen.

You will play a crucial role in establishing analytics engineering standards, designing scalable data models, and driving cross-functional alignment on data governance. You will get substantial exposure to senior leadership, shape the technical direction of analytics infrastructure at Dropbox, and directly influence how data powers product and business decisions.

Our Engineering Career Framework is viewable by anyone outside the company and describes what’s expected for our engineers at each of our career levels. Check out our blog post on this topic and more here.

Responsibilities

Lead the design and implementation of shared, reusable data models, defining shared fact tables, conformed dimensions, and a semantic/metrics layer that serves as the single source of truth across analytics functions
Drive standardization of data engineering practices across ADE and functional analytics teams, including pipeline patterns, CI/CD workflows, naming conventions, and data modeling standards
Partner with Data Infrastructure to modernize orchestration, improve pipeline decomposition, and establish secure dev/test environments with production data access
Architect and implement a shift-left data governance strategy, working with upstream data producers to establish data contracts, SLOs, and code-enforced quality gates that catch issues before production
Collaborate with Data Science leads and Product Management to translate metric definitions into reliable, certified data pipelines that power executive dashboards, WBR reporting, and growth measurement
Reduce operational burden by improving pipeline granularity, observability, and failure recovery, establishing runbooks and alerting standards that make on-call sustainable
Evaluate and integrate AI-native tooling into the data development lifecycle, enabling conversational data exploration with guardrails and AI-assisted pipeline development

On-call work may be necessary occasionally to help address bugs, outages, or other operational issues, with the goal of maintaining a stable and high-quality experience for our customers.

Requirements

BS degree in Computer Science or related technical field, or equivalent technical experience
12+ years of experience in data engineering or analytics engineering with increasing scope and technical leadership
12+ years of SQL experience, including complex analytical queries, window functions, and performance optimization at scale (Spark SQL)
8+ years of Python development experience, including building and maintaining production data pipelines
Deep expertise in dimensional data modeling, schema design, and scalable data architecture, with hands-on experience building shared data models across multiple business domains
Strong experience with orchestration tools (Airflow strongly preferred) and dbt, including pipeline design, scheduling strategies, and failure recovery patterns
Demonstrated ability to drive cross-team technical alignment, establishing standards, influencing without authority, and working across Data Engineering, Data Science, Data Infrastructure, and Product Engineering boundaries

Preferred Qualifications

Experience with Databricks (Unity Catalog, Delta Lake) and modern lakehouse architectures
Experience leading orchestration or platform modernization efforts at scale
Familiarity with data governance and observability tools such as Atlan, Monte Carlo, Great Expectations, or similar
Experience building or contributing to a metrics/semantic layer (dbt MetricFlow, Databricks Metric Views, or equivalent)
Track record of establishing data engineering standards and best practices in a federated analytics organization

Role Description

Responsibilities

Lead the design and implementation of shared, reusable data models, defining shared fact tables, conformed dimensions, and a semantic/metrics layer that serves as the single source of truth across analytics functions
Drive standardization of data engineering practices across ADE and functional analytics teams, including pipeline patterns, CI/CD workflows, naming conventions, and data modeling standards
Partner with Data Infrastructure to modernize orchestration, improve pipeline decomposition, and establish secure dev/test environments with production data access
Architect and implement a shift-left data governance strategy, working with upstream data producers to establish data contracts, SLOs, and code-enforced quality gates that catch issues before production
Collaborate with Data Science leads and Product Management to translate metric definitions into reliable, certified data pipelines that power executive dashboards, WBR reporting, and growth measurement
Reduce operational burden by improving pipeline granularity, observability, and failure recovery, establishing runbooks and alerting standards that make on-call sustainable
Evaluate and integrate AI-native tooling into the data development lifecycle, enabling conversational data exploration with guardrails and AI-assisted pipeline development

On-call work may be necessary occasionally to help address bugs, outages, or other operational issues, with the goal of maintaining a stable and high-quality experience for our customers.

Requirements

BS degree in Computer Science or related technical field, or equivalent technical experience
12+ years of experience in data engineering or analytics engineering with increasing scope and technical leadership
12+ years of SQL experience, including complex analytical queries, window functions, and performance optimization at scale (Spark SQL)
8+ years of Python development experience, including building and maintaining production data pipelines
Deep expertise in dimensional data modeling, schema design, and scalable data architecture, with hands-on experience building shared data models across multiple business domains
Strong experience with orchestration tools (Airflow strongly preferred) and dbt, including pipeline design, scheduling strategies, and failure recovery patterns
Demonstrated ability to drive cross-team technical alignment, establishing standards, influencing without authority, and working across Data Engineering, Data Science, Data Infrastructure, and Product Engineering boundaries

Preferred Qualifications

Experience with Databricks (Unity Catalog, Delta Lake) and modern lakehouse architectures
Experience leading orchestration or platform modernization efforts at scale
Familiarity with data governance and observability tools such as Atlan, Monte Carlo, Great Expectations, or similar
Experience building or contributing to a metrics/semantic layer (dbt MetricFlow, Databricks Metric Views, or equivalent)
Track record of establishing data engineering standards and best practices in a federated analytics organization