Lead Data Scientist – Gen AI for Condition Monitoring Analytics

Caterpillar Caterpillar · Industrial · Chicago, IL +1

Lead Data Scientist role focused on developing and integrating digital twins for condition monitoring and generative AI-assisted predictive analytics for heavy machinery. Responsibilities include algorithm development (anomaly detection, GANs), digital twin engineering, optimization for GPU architectures, edge deployment on NVIDIA Jetson, and developing Generative AI agents for automated diagnostics and repair prioritization. The role involves integrating multi-modal data and collaborating with various business partners.

What you'd actually do

  1. Design and implement GPU-accelerated machine learning models (e.g., XGBoost, autoencoders, and GANs) to identify irregular patterns in high-frequency sensor data.
  2. Partner with engineering teams to develop onboard digital twins using NVIDIA architecture to simulate, predict, and optimize the performance of heavy machinery
  3. Develop Generative AI agents that synthesize telematics data to generate prioritized repairs for identified machine faults
  4. Adapt and test algorithms for onboard architecture, leveraging tools like NVIDIA Jetson and real-time edge processing on Cat equipment.
  5. Be a technical lead on multiple complex projects with assistance of junior team members

Skills

Required

  • Python
  • Machine Learning
  • Generative AI
  • LLMs
  • Fine-tuning
  • Prompt Engineering
  • RAG
  • Anomaly Detection
  • Time-Series Analysis
  • Predictive Maintenance
  • Telematics
  • IoT sensor data
  • CAN bus protocols
  • High performance computing
  • Statistical tools
  • Analytical Thinking
  • Requirements Analysis

Nice to have

  • NVIDIA architecture
  • NVIDIA Jetson
  • GANs
  • Digital Twins
  • Edge deployment
  • Hardware-Software Co-Design
  • Simulation-Based Training
  • Cloud technologies (AWS, Azure, Google Cloud, etc.)
  • Raspberry Pi
  • Heavy equipment engineering

What the JD emphasized

  • Generative AI & LLMs: Proficiency in Fine-tuning and Prompt Engineering for Large Language Models, specifically using Retrieval-Augmented Generation (RAG)
  • Condition Monitoring Algorithms: Deep understanding of Anomaly Detection, Time-Series Analysis, and Predictive Maintenance models.
  • Telematics: Experience handling high-frequency IoT sensor data, CAN bus protocols (J1939), and integrating with unified data platforms
  • Experience with High performance computing
  • Extensive experience applying Python (NumPy, SciPy, pandas, etc.) programming to solve business challenges.
  • Extensive experience with advanced data analysis, machine learning such as clustering, Log regressions, neural nets and statistical methods such as statistical process control, etc. (typically 8+ years)

Other signals

  • Generative AI agents
  • digital twins for condition monitoring
  • predictive analytics
  • onboard architecture
  • edge deployment