Lead Data Scientist - Gen AI & Digital Twin

Caterpillar Caterpillar · Industrial · Chicago, IL

Lead Data Scientist role focused on developing and integrating digital twins and GenAI-assisted predictive analytics for condition monitoring of Caterpillar equipment. Involves algorithm development (anomaly detection, GANs), digital twin engineering (NVIDIA architecture), optimization for GPUs, edge deployment on NVIDIA Jetson, and developing Generative AI agents for automated diagnostics and unified data orchestration.

What you'd actually do

  1. Design and implement GPU-accelerated machine learning models (e.g., XGBoost, autoencoders, and GANs using Tesseract) to identify fault patterns in timeseries sensor data.
  2. Partner with engineering teams to develop onboard digital twins using NVIDIA architecture (e.g. PhysicsNeMo) to simulate, predict, and optimize the performance of heavy machinery
  3. Adapt and test algorithms for onboard architecture, leveraging tools like NVIDIA Jetson for ROM generation and real-time edge processing on Cat equipment.
  4. Develop Generative AI agents that synthesize telematics data to generate prioritized repairs for identified machine faults.
  5. Integrate multi-modal outputs from condition monitoring analytics & asset life history to create a machine-specific context for AI assistant.

Skills

Required

  • Python programming
  • advanced data analysis
  • machine learning (clustering, Log regressions, neural nets)
  • statistical methods (statistical process control)
  • Fine-tuning and Prompt Engineering for Large Language Models
  • Retrieval-Augmented Generation (RAG)
  • Anomaly Detection
  • Time-Series Analysis
  • Predictive Maintenance models
  • handling high-frequency IoT sensor data
  • CAN bus protocols (J1939)
  • High performance computing
  • version control / repositories such as GitHub
  • Agile environment

Nice to have

  • practical applications of onboard architecture / software (e.g. mini projects using Raspberry Pi or any other architecture is a bonus)
  • heavy equipment engineering or data analysis
  • cloud technologies (AWS, Azure, Google Cloud, etc.)

What the JD emphasized

  • GPU-accelerated machine learning models
  • NVIDIA architecture
  • NVIDIA Jetson
  • Generative AI agents
  • Fine-tuning and Prompt Engineering for Large Language Models
  • Retrieval-Augmented Generation (RAG)

Other signals

  • digital twins
  • GenAI
  • predictive analytics
  • condition monitoring
  • edge deployment