We are looking for a Senior Advanced Data Engineer with 9+ years of experience to architect, build, and scale cloud‑native data and AI platforms on Azure using Databricks. This is a senior, hands‑on technical leadership role requiring deep expertise in data engineering, lakehouse architecture, and AI/ML data pipelines to enable advanced analytics, machine learning, and business intelligence use cases.
The ideal candidate will lead complex, enterprise‑scale data initiatives, work closely with data scientists and ML engineers, and play a critical role in shaping the organization’s data and AI strategy.
Architect, design, and lead the development of end‑to‑end data pipelines on Azure using Databricks (Spark / PySpark)
Own the design and evolution of lakehouse architecture using Azure Data Lake Storage (ADLS Gen2) and Delta Lake
Build, optimize, and scale batch and streaming pipelines for large‑volume, high‑velocity datasets
Design and manage feature engineering pipelines and curated datasets for AI/ML model training, validation, and inference
Partner closely with Data Scientists and ML Engineers to enable scalable, production‑ready ML workflows
Support and integrate with MLOps pipelines, including:
- Data and feature versioning
- Feature stores
- Model deployment readiness
Lead optimization of Databricks workloads for performance, scalability, reliability, and cost efficiency
Define and implement data quality, validation, monitoring, and observability frameworks
Enforce data security, governance, and compliance using Azure and Databricks best practices
Review designs and code, establish engineering standards, and ensure platform reliability
Mentor and technically guide senior, mid‑level, and junior data engineers
Lead architectural decision‑making and contribute to long‑term data platform and AI roadmap planning
Act as a technical authority and escalation point for complex data engineering challenges
Required Skills & Qualifications
9+ years of hands‑on experience in Data Engineering, Data Platform, or Big Data roles
Deep expertise in Python, PySpark, and Spark SQL
Extensive, real‑world experience with Databricks, including:
- Jobs, notebooks, workflows
- Delta Live Tables
- Performance tuning and job orchestration
Strong experience with Azure cloud services, including:
- Azure Data Lake Storage (ADLS Gen2)
- Azure Databricks
- Azure Data Factory and/or Synapse Pipelines
Expert‑level understanding of Delta Lake, including ACID guarantees, schema enforcement, and optimizations
Advanced SQL skills for analytical data modeling and transformations
Proven experience designing AI/ML data pipelines (training, validation, inference datasets)
Strong understanding of lakehouse, data warehousing, and dimensional modeling concepts
Hands‑on experience with CI/CD pipelines, Git, and DevOps practices for data platforms
Excellent troubleshooting, diagnostics, and performance tuning skills
Strong communication and stakeholder collaboration abilities
Preferred / Nice to Have Skills
- Experience with Azure Machine Learning or Databricks ML
- Hands‑on experience with Feature Store, MLflow, or experiment tracking frameworks
- Streaming data experience using Kafka, Azure Event Hubs, or Spark Structured Streaming
- Experience with dbt, Unity Catalog, or enterprise data governance tools
- Familiarity with Power BI or other BI/visualization tools
- Strong exposure to production‑grade MLOps systems and best practices
- Prior experience as a Technical Lead, Principal Engineer, or Architecture Owner
- Knowledge on LangChain , Agent, Agent Architecture.