What you'd actually do

Architect end-to-end data pipelines processing terabytes of IoT telemetry on Azure Databricks (PySpark DLT, Lakeflow) using medallion Lakehouse architecture.

Design and optimize real-time ingestion pipelines from Azure Event Hub and Apache Kafka for high-volume industrial IoT telemetry.

Build fault-tolerant, idempotent streaming architectures handling schema evolution, backpressure, and latency SLAs.

Lead architecture reviews, set engineering standards, and drive decisions on data modeling, pipeline design, and platform evolution.

Define technical direction for AI-ready data products including vector stores, embedding pipelines, and RAG-ready structured/unstructured data.

Skills

Required

8+ years of data engineering experience
at least 2 years in a lead or senior role
building and operating medallion lakehouse architectures (Bronze / Silver / Gold)
Apache Spark / PySpark
Azure Databricks
streaming platforms - Apache Kafka and/or Azure Event Hub
Cloud data architecture skills (Azure preferred)
Data modeling and schema design expertise
building data pipelines for GenAI or ML applications: RAG systems, embedding pipelines, and document ingestion
MLOps familiarity including model versioning, feature stores, and monitoring/observability for data and ML systems
lead technical design reviews
mentor engineers
drive architectural decisions with stakeholder buy-in
CI/CD using GitHub Actions

Nice to have

LangChain, LangGraph, or other agentic AI orchestration frameworks
real-time data processing frameworks (Apache Spark Streaming, Structured Streaming)
MLOps practices
time-series databases and IoT data modeling patterns
containerization (Docker)
orchestration (Kubernetes)
data quality implementation for AI training data
distributed teams and cross-functional collaboration
data security and governance practices for AI systems
Agile and Scrum Methodologies

Honeywell is accelerating its transformation from industrial automation to full autonomy and the data that powers that future starts here. As a Lead Data Engineer on our Industrial AI & Data Platforms team, you will architect and own the data foundations that enable physical AI at scale: from terabytes of IoT sensor telemetry streaming through our medallion lakehouse to production-grade Generative AI pipelines that deliver actionable intelligence across Honeywell's global industrial operations.

You will serve as a technical anchor for a team of engineers, shaping architecture decisions, building cutting edge AI solutions, raising engineering standards, and directly building the AI-ready data products that Honeywell's autonomy mission depends on. This role sits at the intersection of modern data engineering and applied GenAI, if you are energized by building systems that have real-world industrial impact, this is your role.

Architecture & Technical Leadership

Architect end-to-end data pipelines processing terabytes of IoT telemetry on Azure Databricks (PySpark DLT, Lakeflow) using medallion Lakehouse architecture.
Design and optimize real-time ingestion pipelines from Azure Event Hub and Apache Kafka for high-volume industrial IoT telemetry.
Build fault-tolerant, idempotent streaming architectures handling schema evolution, backpressure, and latency SLAs.
Lead architecture reviews, set engineering standards, and drive decisions on data modeling, pipeline design, and platform evolution.
Define technical direction for AI-ready data products including vector stores, embedding pipelines, and RAG-ready structured/unstructured data.
Adopt emerging LLM orchestration frameworks (LangChain, LangGraph) to accelerate GenAI platform capabilities.

GenAI & AI Pipeline Development

Build production GenAI pipelines- RAG workflows, document ingestion, PII anonymization and vector database infrastructure.
Collaborate with data scientists and AI engineers to deliver high-quality, AI-ready datasets that improve downstream model performance.

DevOps, Security & Governance

Enforce data governance, access control, and security policies; lead PII detection and anonymization strategies across the data platform.
Champion CI/CD practices using GitHub Actions, DAB, Octopus, and Bamboo for automated, reliable pipeline delivery.
Ensure compliance with enterprise security standards within the SDLC.

Team Development & Stakeholder Engagement

Mentor engineers across seniority levels through code reviews, pairing, and technical coaching.
Translate business and AI product requirements into clear technical roadmaps and execution plans.
Partner with data scientists, product owners, and architects to align data investments with Honeywell's autonomy strategy.

YOU MUST HAVE

8+ years of data engineering experience with at least 2 years in a lead or senior role, demonstrating progression in technical complexity and team leadership.
Hands-on experience building and operating medallion lakehouse architectures (Bronze / Silver / Gold).
Deep expertise in Apache Spark / PySpark with production experience on Azure Databricks at scale.
Strong proficiency with streaming platforms - Apache Kafka and/or Azure Event Hub for real-time IoT data.
Cloud data architecture skills (Azure preferred; AWS/GCP a plus) with experience designing scalable, cost-effective data lakes and warehouses using cloud-native services.
Data modeling and schema design expertise for both transactional and analytical workloads, including dimensional modeling and data vault methodologies.
Proven experience building data pipelines for GenAI or ML applications: RAG systems, embedding pipelines, and document ingestion.
MLOps familiarity including model versioning, feature stores, and monitoring/observability for data and ML systems.
Demonstrated ability to lead technical design reviews, mentor engineers, and drive architectural decisions with stakeholder buy-in.
Proficiency in CI/CD using GitHub Actions for automating data pipeline deployments.

WE VALUE

Experience with LangChain, LangGraph, or other agentic AI orchestration frameworks.
Expertise in real-time data processing frameworks (Apache Spark Streaming, Structured Streaming)
Knowledge of MLOps practices and experience building data pipelines for AI model deployment
Experience with time-series databases and IoT data modeling patterns
Familiarity with containerization (Docker) and orchestration (Kubernetes) for AI workloads
Strong background in data quality implementation for AI training data
Experience working with distributed teams and cross-functional collaboration
Knowledge of data security and governance practices for AI systems
Experience working on analytics projects with Agile and Scrum Methodologies

US PERSON REQUIREMENTS:

Due to compliance with U.S. export control laws and regulations, candidate must be a U.S. Person which is defined as a U.S. citizen, a U.S. permanent resident, or have protected status In the U.S. under asylum or refugee status or have the ability to obtain an export authorization.

In addition to a competitive salary, leading-edge work, and developing solutions side-by-side with dedicated experts in their fields, Honeywell employees are eligible for a comprehensive benefits package. This package includes employer subsidized Medical, Dental, Vision, and Life Insurance; Short-Term and Long-Term Disability; 401(k) match, Flexible Spending Accounts, Health Savings Accounts, EAP, and Educational Assistance; Parental Leave, Paid Time Off (for vacation, personal business, sick time, and parental leave), and 12 Paid Holidays. For more information visit: Benefits at Honeywell

The application period for the job is estimated to be 40 days from the job posting date; however, this may be shortened or extended depending on business needs and the availability of qualified candidates.