What you'd actually do

Design and implement scalable data architectures to process high-volume IoT sensor data and telemetry streams, ensuring reliable data capture and processing for AI/ML workloads

Build and maintain data pipelines for AI product lifecycle, including training data preparation, feature engineering, and inference data flows

Develop and optimize RAG (Retrieval Augmented Generation) systems, including vector databases, embedding pipelines, and efficient retrieval mechanisms

Lead the architecture and development of scalable data platforms on Databricks

Drive the integration of GenAI capabilities into data workflows and applications

Skills

Required

Databricks
Delta Lake
Delta Live Tables (DLT)
Lakeflow
PySpark
Azure
GCP
Databricks Asset Bundles (DAB)
Git workflows
GitHub Actions
DataOps
RAG
vector databases
LLM integration
LangChain
LangGraph

Nice to have

Apache Spark Streaming
Structured Streaming
MLOps
time-series databases
IoT data modeling
Docker
Kubernetes
data quality implementation for AI training data
Agile
Scrum

What the JD emphasized

Minimum 5 years of experience building production data pipelines in Databricks processing TB scale data

Extensive experience implementing medallion architecture (Bronze/Silver/Gold) with Delta Lake, Delta Live Tables (DLT), and Lakeflow for batch and streaming pipelines from

Strong hands-on proficiency with PySpark for distributed data processing and transformation

Strong experience working with cloud platforms such as Azure, GCP and Databricks, especially in designing and implementing AI/ML-driven data workflows

Proficient in CI/CD practices using Databricks Asset Bundles (DAB), Git workflows, GitHub Actions, and understanding of DataOps practices including data quality testing and observability

Hands-on experience building RAG applications with vector databases, LLM integration, and agentic frameworks like LangChain, LangGraph

Natural analytical mindset with demonstrated ability to explore data, debug complex distributed systems, and optimize pipeline performance at scale

Other signals

design and implement scalable data architectures and pipelines that enable next-generation AI capabilities

transform high-volume IoT telemetry into reliable, actionable insights that support Honeywell’s connected industrial solutions

Build and maintain data pipelines for AI product lifecycle, including training data preparation, feature engineering, and inference data flows

Develop and optimize RAG (Retrieval Augmented Generation) systems, including vector databases, embedding pipelines, and efficient retrieval mechanisms

Partner with ML engineers and data scientists to implement efficient data workflows for model training, fine-tuning, and deployment

As a Senior Data Engineer, you will be part of a high-performing global team delivering advanced AI and data solutions for Honeywell’s industrial customers, with a focus on IoT and real-time data processing. In this role, you will design and implement scalable data architectures and pipelines that enable next-generation AI capabilities, including large-scale machine learning models, intelligent automation, and real-time analytics. You will work closely with cross-functional teams to transform high-volume IoT telemetry into reliable, actionable insights that support Honeywell’s connected industrial solutions.

You will report directly to our Data Engineering Manager and you’ll work out of our Atlanta, GA location on a Hybrid work schedule. Note: for the first 90 days, new hires must be prepared to work 100% onsite M-F.

KEY RESPONSIBILITIES

Data Engineering & AI Pipeline Development:

Design and implement scalable data architectures to process high-volume IoT sensor data and telemetry streams, ensuring reliable data capture and processing for AI/ML workloads
Build and maintain data pipelines for AI product lifecycle, including training data preparation, feature engineering, and inference data flows
Develop and optimize RAG (Retrieval Augmented Generation) systems, including vector databases, embedding pipelines, and efficient retrieval mechanisms
Lead the architecture and development of scalable data platforms on Databricks
Drive the integration of GenAI capabilities into data workflows and applications
Optimize data processing for performance, cost, and reliability at scale
Create robust data integration solutions that combine industrial IoT data streams with enterprise data sources for AI model training and inference

DataOps:

Implement DataOps practices to ensure continuous integration and delivery of data pipelines powering AI solutions
Design and maintain automated testing frameworks for data quality, data drift detection, and AI model performance monitoring
Create self-service data assets enabling data scientists and ML engineers to access and utilize data efficiently
Design and maintain automated documentation for data lineage and AI model provenance

Collaboration & Innovation:

Partner with ML engineers and data scientists to implement efficient data workflows for model training, fine-tuning, and deployment
Mentor team members and provide technical leadership on complex data engineering challenges
Establish data engineering best practices, including modular code design and reusable frameworks
Drive projects to completion while working in an agile environment with evolving requirements in the rapidly changing AI landscape

YOU MUST HAVE

Minimum 5 years of experience building production data pipelines in Databricks processing TB scale data
Extensive experience implementing medallion architecture (Bronze/Silver/Gold) with Delta Lake, Delta Live Tables (DLT), and Lakeflow for batch and streaming pipelines from
Event Hub or Kafka sources
Strong hands-on proficiency with PySpark for distributed data processing and transformation
Strong experience working with cloud platforms such as Azure, GCP and Databricks, especially in designing and implementing AI/ML-driven data workflows
Proficient in CI/CD practices using Databricks Asset Bundles (DAB), Git workflows, GitHub Actions, and understanding of DataOps practices including data quality testing and observability
Hands-on experience building RAG applications with vector databases, LLM integration, and agentic frameworks like LangChain, LangGraph
Natural analytical mindset with demonstrated ability to explore data, debug complex distributed systems, and optimize pipeline performance at scale

WE VALUE

Experience building RAG and agentic architecture solutions and working with LLM-powered applications
Expertise in real-time data processing frameworks (Apache Spark Streaming, Structured Streaming)
Knowledge of MLOps practices and experience building data pipelines for AI model deployment
Experience with time-series databases and IoT data modeling patterns
Familiarity with containerization (Docker) and orchestration (Kubernetes) for AI workloads
Strong background in data quality implementation for AI training data
Experience working with distributed teams and cross-functional collaboration
Knowledge of data security and governance practices for AI systems
Experience working on analytics projects with Agile and Scrum Methodologies

US PERSON REQUIREMENT

Due to compliance with U.S. export control laws and regulations, candidate must be a U.S. Person, which is defined as a U.S. citizen, a U.S. permanent resident, or have protected status in the U.S. under asylum or refugee status, or have the ability to obtain an export authorization.

BENEFITS OF WORKING FOR HONEYWELL

In addition to a competitive salary, leading-edge work, and developing solutions side-by-side with dedicated experts in their fields, Honeywell employees are eligible for a comprehensive benefits package. This package includes employer-subsidized Medical, Dental, Vision, and Life Insurance; Short-Term and Long-Term Disability; 401(k) match, Flexible Spending Accounts, Health Savings Accounts, EAP, and Educational Assistance; Parental Leave, Paid Time Off (for vacation, personal business, sick time, and parental leave), and 12 Paid Holidays.

ABOUT HONEYWELL

Honeywell International Inc. (Nasdaq: HON) invents and commercializes technologies that address some of the world's most critical challenges around energy, safety, security, air travel, productivity, and global urbanization. We are a leading software-industrial company committed to introducing state-of-the-art technology solutions to improve efficiency, productivity, sustainability, and safety in high-growth businesses in broad-based, attractive industrial end markets. Our products and solutions enable a safer, more comfortable, and more productive world, enhancing the quality of life of people around the globe.

THE BUSINESS UNIT

Honeywell Connected Enterprise (HCE) is the software division of Honeywell with a strategic focus on digitization, sustainability, and OT Cybersecurity SaaS offerings and solutions. HCE was established to leverage Honeywell’s domain expertise and lead the transition into a cutting-edge industrial software company. Since our inception in 2018, HCE established the category of intelligent operations and built a new platform born out of decades of operational data and insights, uniting real-time data across assets, people, and processes into a system of record for a 360-degree view. This is our flagship offering - Honeywell Forge. We are a global team of thousands of innovators with expertise spanning industrial operations, software engineering, data science, artificial intelligence, and process engineering. We are paving the way for our customers to grow responsibly. We believe the future is what we make it. As a Honeywell Futureshaper, you are a part of something bigger. You can work with highly capable people to make the world a better place and become the best you. After all, we are not imagining the future; we’re building it.

The application period for the job is estimated to be 40 days from the job posting date; however, this may be shortened or extended depending on business needs and the availability of qualified candidates. Posting date: 4/7/2026