Lead Data Scientist

Caterpillar · Industrial · Chennai, Tamil Nadu +1

Lead Data Scientist at Caterpillar Digital, focusing on developing and scaling a Manufacturing & Supply Digital Platform powered by NVIDIA technologies. The role involves coaching junior data scientists, implementing AI/ML and LLM models, optimization algorithms, and leading validation of AI/ML models using industrial analytics metrics. Emphasis on connecting diverse systems, data-driven decision making, automation, and enhanced collaboration within the manufacturing lifecycle.

What you'd actually do

  1. Coach and mentor Junior Data Scientists in the creation, validation, and application of statistical, machine learning and LLM models as well as in implementing AI solutions
  2. Enhance team’s creativity to solve business problems using machine learning techniques
  3. Evaluate and implement modern optimization algorithms: stochastic search, evolutionary/genetic, reinforcement, and multi-objective optimization.
  4. Lead expert-level validation of AI/ML models using industrial analytics metrics such as yield improvement, downtime reduction, and prediction accuracy.
  5. Establish standards for data/model lifecycle management, production monitoring, and feedback from physical operations.

Skills

Required

  • machine learning
  • deep learning
  • LLM models
  • optimization algorithms
  • Python
  • manufacturing process analytics
  • multivariate analysis
  • defect analytics
  • OpenUSD-powered asset graphs
  • digital twins

Nice to have

  • cloud-native deployments
  • industrial IoT/edge analytics
  • data pipeline automation
  • plant/line telemetry
  • quality
  • maintenance
  • logistics
  • scheduling

What the JD emphasized

  • Deep expertise in machine learning and deep learning techniques
  • rigorous evaluation methods
  • Team/project leadership with publications or patents in industrial ML or optimization

Other signals

  • AI/ML models
  • LLM models
  • AI solutions
  • optimization algorithms
  • industrial analytics metrics
  • data/model lifecycle management
  • production monitoring
  • composable model services
  • retrieval augmented generation
  • multimodal orchestration
  • digital twins
  • NVIDIA Omniverse
  • AI computing capabilities