Sr Data Engineer

Honeywell Honeywell · Industrial · Atlanta, GA +1

Senior Data Engineer at Honeywell focused on building scalable data pipelines and architectures for industrial AI solutions, with an emphasis on IoT data, RAG systems, and supporting the AI product lifecycle. The role involves significant work with Databricks, PySpark, and cloud platforms (Azure, GCP).

What you'd actually do

  1. Design and implement scalable data architectures to process high-volume IoT sensor data and telemetry streams, ensuring reliable data capture and processing for AI/ML workloads
  2. Build and maintain data pipelines for AI product lifecycle, including training data preparation, feature engineering, and inference data flows
  3. Develop and optimize RAG (Retrieval Augmented Generation) systems, including vector databases, embedding pipelines, and efficient retrieval mechanisms
  4. Lead the architecture and development of scalable data platforms on Databricks
  5. Drive the integration of GenAI capabilities into data workflows and applications

Skills

Required

  • Databricks
  • Delta Lake
  • Delta Live Tables (DLT)
  • Lakeflow
  • PySpark
  • Azure
  • GCP
  • Databricks Asset Bundles (DAB)
  • Git workflows
  • GitHub Actions
  • DataOps
  • RAG
  • vector databases
  • LLM integration
  • LangChain
  • LangGraph

Nice to have

  • Apache Spark Streaming
  • Structured Streaming
  • MLOps
  • time-series databases
  • IoT data modeling
  • Docker
  • Kubernetes
  • data quality implementation for AI training data
  • Agile
  • Scrum

What the JD emphasized

  • Minimum 5 years of experience building production data pipelines in Databricks processing TB scale data
  • Extensive experience implementing medallion architecture (Bronze/Silver/Gold) with Delta Lake, Delta Live Tables (DLT), and Lakeflow for batch and streaming pipelines from
  • Strong hands-on proficiency with PySpark for distributed data processing and transformation
  • Strong experience working with cloud platforms such as Azure, GCP and Databricks, especially in designing and implementing AI/ML-driven data workflows
  • Proficient in CI/CD practices using Databricks Asset Bundles (DAB), Git workflows, GitHub Actions, and understanding of DataOps practices including data quality testing and observability
  • Hands-on experience building RAG applications with vector databases, LLM integration, and agentic frameworks like LangChain, LangGraph
  • Natural analytical mindset with demonstrated ability to explore data, debug complex distributed systems, and optimize pipeline performance at scale

Other signals

  • design and implement scalable data architectures and pipelines that enable next-generation AI capabilities
  • transform high-volume IoT telemetry into reliable, actionable insights that support Honeywell’s connected industrial solutions
  • Build and maintain data pipelines for AI product lifecycle, including training data preparation, feature engineering, and inference data flows
  • Develop and optimize RAG (Retrieval Augmented Generation) systems, including vector databases, embedding pipelines, and efficient retrieval mechanisms
  • Partner with ML engineers and data scientists to implement efficient data workflows for model training, fine-tuning, and deployment