We are looking for a Senior Data Engineer with 6+ years of experience to design, build, and scale cloud‑native data and AI platforms on Azure using Databricks. The role requires strong hands‑on expertise in data engineering, lakehouse architecture, and AI/ML data pipelines to support advanced analytics, machine learning, and business intelligence use cases.
The ideal candidate will lead complex data initiatives, collaborate closely with data scientists and ML engineers, and play a key role in shaping the organization’s data and AI strategy.
- Architect and develop end‑to‑end data pipelines on Azure using Databricks (Spark / PySpark)
- Design and maintain lakehouse architectures using Azure Data Lake + Delta Lake
- Build and optimize batch and streaming pipelines for large‑scale datasets
- Create and manage feature pipelines and curated datasets for AI/ML model training and inference
- Collaborate with data scientists and ML engineers to enable scalable ML workflows
- Support MLOps pipelines, including data versioning, feature stores, and model deployment readiness
- Optimize Databricks workloads for performance, scalability, and cost efficiency
- Implement data quality, validation, monitoring, and observability frameworks
- Ensure data security, governance, and compliance using Azure and Databricks best practices
- Review code, define standards, and mentor junior and mid‑level data engineers
- Lead architectural decisions and contribute to data platform roadmap planning
Required Skills & Qualifications
6+ years of hands‑on experience in Data Engineering or Data Platform roles
Strong proficiency in Python, PySpark, and Spark SQL
Extensive experience with Databricks (jobs, notebooks, workflows, Delta Live Tables)
Strong experience with Azure Cloud services, including:
- Azure Data Lake Storage (ADLS Gen2)
- Azure Databricks
- Azure Data Factory / Synapse Pipelines
Solid understanding of Delta Lake, including optimization and ACID guarantees
Advanced SQL skills for analytical data modeling
Experience designing AI/ML data pipelines (training, validation, inference datasets)
Knowledge of data warehousing, lakehouse, and dimensional modeling concepts
Familiarity with CI/CD, Git, and DevOps practices
Strong troubleshooting, performance tuning, and problem‑solving skills
Develop, orchestrate and maintain scalable data pipelines using DAG based workflows to ensure reliable and efficient data processing
Preferred / Nice to Have Skills
- Experience with ML platforms such as Azure Machine Learning or Databricks ML
- Hands‑on experience with Feature Store, MLflow, or experiment tracking
- Streaming data experience (Kafka, Event Hubs, Spark Structured Streaming)
- Experience with dbt, Unity Catalog, or data governance tools
- Knowledge of BI and visualization tools (Power BI preferred)
- Exposure to MLOps best practices and production ML systems
- Prior experience as a technical lead or mentor
- Knowledge on LangChain , Agent, Agent Architecture.