What you'd actually do

Architect and develop end‑to‑end data pipelines on Azure using Databricks (Spark / PySpark)

Design and maintain lakehouse architectures using Azure Data Lake + Delta Lake

Build and optimize batch and streaming pipelines for large‑scale datasets

Create and manage feature pipelines and curated datasets for AI/ML model training and inference

Collaborate with data scientists and ML engineers to enable scalable ML workflows

Skills

Required

Python
PySpark
Spark SQL
Databricks
Azure Data Lake Storage (ADLS Gen2)
Azure Databricks
Azure Data Factory / Synapse Pipelines
Delta Lake
SQL
Data Warehousing
Lakehouse Architecture
Dimensional Modeling
CI/CD
Git
DevOps
Troubleshooting
Performance Tuning
Problem-solving
DAG based workflows

Nice to have

Azure Machine Learning
Databricks ML
Feature Store
MLflow
Experiment Tracking
Kafka
Event Hubs
Spark Structured Streaming
dbt
Unity Catalog
Data Governance
Power BI
MLOps
Technical Lead
Mentor
LangChain
Agent
Agent Architecture

What the JD emphasized

6+ years of hands‑on experience in Data Engineering or Data Platform roles

Strong proficiency in Python, PySpark, and Spark SQL

Extensive experience with Databricks (jobs, notebooks, workflows, Delta Live Tables)

Strong experience with Azure Cloud services

Solid understanding of Delta Lake, including optimization and ACID guarantees

Advanced SQL skills for analytical data modeling

Experience designing AI/ML data pipelines (training, validation, inference datasets)

Develop, orchestrate and maintain scalable data pipelines using DAG based workflows to ensure reliable and efficient data processing

We are looking for a Senior Data Engineer with 6+ years of experience to design, build, and scale cloud‑native data and AI platforms on Azure using Databricks. The role requires strong hands‑on expertise in data engineering, lakehouse architecture, and AI/ML data pipelines to support advanced analytics, machine learning, and business intelligence use cases.

The ideal candidate will lead complex data initiatives, collaborate closely with data scientists and ML engineers, and play a key role in shaping the organization’s data and AI strategy.

Architect and develop end‑to‑end data pipelines on Azure using Databricks (Spark / PySpark)
Design and maintain lakehouse architectures using Azure Data Lake + Delta Lake
Build and optimize batch and streaming pipelines for large‑scale datasets
Create and manage feature pipelines and curated datasets for AI/ML model training and inference
Collaborate with data scientists and ML engineers to enable scalable ML workflows
Support MLOps pipelines, including data versioning, feature stores, and model deployment readiness
Optimize Databricks workloads for performance, scalability, and cost efficiency
Implement data quality, validation, monitoring, and observability frameworks
Ensure data security, governance, and compliance using Azure and Databricks best practices
Review code, define standards, and mentor junior and mid‑level data engineers
Lead architectural decisions and contribute to data platform roadmap planning

Required Skills & Qualifications

6+ years of hands‑on experience in Data Engineering or Data Platform roles
Strong proficiency in Python, PySpark, and Spark SQL
Extensive experience with Databricks (jobs, notebooks, workflows, Delta Live Tables)
Strong experience with Azure Cloud services, including:
- Azure Data Lake Storage (ADLS Gen2)
- Azure Databricks
- Azure Data Factory / Synapse Pipelines
Solid understanding of Delta Lake, including optimization and ACID guarantees
Advanced SQL skills for analytical data modeling
Experience designing AI/ML data pipelines (training, validation, inference datasets)
Knowledge of data warehousing, lakehouse, and dimensional modeling concepts
Familiarity with CI/CD, Git, and DevOps practices
Strong troubleshooting, performance tuning, and problem‑solving skills
Develop, orchestrate and maintain scalable data pipelines using DAG based workflows to ensure reliable and efficient data processing

Preferred / Nice to Have Skills

Experience with ML platforms such as Azure Machine Learning or Databricks ML
Hands‑on experience with Feature Store, MLflow, or experiment tracking
Streaming data experience (Kafka, Event Hubs, Spark Structured Streaming)
Experience with dbt, Unity Catalog, or data governance tools
Knowledge of BI and visualization tools (Power BI preferred)
Exposure to MLOps best practices and production ML systems
Prior experience as a technical lead or mentor
Knowledge on LangChain , Agent, Agent Architecture.