Sr Advanced AI Data Engineer

Honeywell Honeywell · Industrial · Monterrey, NLE, Mexico

Senior Advanced Data Engineer responsible for designing, developing, and maintaining advanced data solutions, including building scalable data pipelines, optimizing data storage, and ensuring data quality for AI/ML consumption. The role focuses on building an AI-ready data platform using Databricks and Azure, implementing data modeling, orchestration, and governance, with a strong emphasis on enabling AI and proactive analytics.

What you'd actually do

  1. Design and implement end-to-end ingestion pipelines from heterogeneous sources: including Snowflake, SQL Server, Excel, REST APIs, and unstructured data: into Azure Databricks
  2. Architect and enforce Medallion Architecture (Bronze → Silver → Gold) ensuring data arrives clean, validated, and fit for purpose at each layer
  3. Build Delta Live Tables (DLT) pipelines with declarative data quality expectations, schema evolution, and automated lineage tracking
  4. Implement incremental loading patterns using CDC (Change Data Capture), watermarking, and Delta Lake MERGE/UPSERT for efficient, scalable ingestion
  5. Enable structured and unstructured data processing: documents, Excel files, JSON, Parquet : building the foundation for AI and ML consumption

Skills

Required

  • Databricks
  • PySpark
  • Delta Lake
  • Workflows
  • Unity Catalog
  • Medallion Architecture
  • Domain Data Modeling
  • Functional Data Architecture
  • Data Quality Frameworks
  • Incremental loading
  • CDC
  • CI/CD
  • Observability
  • Python
  • SQL
  • Azure Databricks
  • Production environments

Nice to have

  • DLT
  • GCP
  • Azure
  • Kafka
  • Databricks Certified Professional
  • financial datasets
  • engineering datasets
  • enterprise datasets
  • industrial-scale datasets

What the JD emphasized

  • 4+ years hands-on: PySpark, Delta Lake, Workflows, Unity Catalog
  • Demonstrate expertise in data strategy, for example: Medallion Architecture, Domain Data Modeling and Functional Data Architecture
  • Data Quality Frameworks (i.e. rule-based validation, anomaly detection)
  • Data Pipelines: incremental loading, CDC, CI/CD, Observability
  • Advanced Python/Pyspark and Advanced SQL
  • Proven experience building platforms, not just maintaining them: greenfield builds, migrations, framework development
  • Demonstrated ability to own technical decisions end-to-end: from architecture to production deployment

Other signals

  • AI-ready data platform
  • foundation for AI and ML consumption
  • supporting model training, feature stores, and real-time inference pipelines