Lead Data Scientist

Caterpillar Caterpillar · Industrial · Irving, TX +2

Lead Data Scientist for Caterpillar's Cybersecurity GRC team, focusing on Advanced Analytics and GenAI initiatives. Responsibilities include data discovery, preparation, and processing for ML/AI models, exploring semantic data capabilities, defining data analysis scope, researching data model optimization, planning technical deliverables, communicating insights, and developing/validating/training/implementing statistical and digital solutions. Requires extensive experience in business statistics, machine learning/AI, Python programming, database management, data analysis (SQL, ETL/ELT), and requirements analysis.

What you'd actually do

  1. Conduct Data Discovery, data preparation and data processing for business intelligence and ML / AI Models.
  2. Exploring, promoting, and implementing semantic data capabilities through data analytics and machine learning techniques.
  3. Leading to define requirements and scope of data analyses; presenting and reporting possible business insights to management using data visualization technologies.
  4. Conducting research on data model optimization and algorithms to improve effectiveness and accuracy on data analyses.
  5. Plan technical deliverables and definition of done for each sprint for the Jr. Data Engineers

Skills

Required

  • Business Statistics
  • Machine Learning / AI
  • Python programming
  • Database Management and Consumption
  • Data Analysis
  • SQL
  • data modeling
  • ETL / ELT
  • Requirements Analysis

Nice to have

  • Master’s degree in Applied Statistics, Data Science, Business Analytics, Business Intelligence & Analytics, Mathematics, Computer Science, Computer Science, Engineering, Informatics, Information Systems Management, Mathematics, MBA with Technical Undergrad, Predictive Analytics, Statistics, or equivalent technical degree.
  • advanced data analysis and statistical methods such as regression, hypothesis testing, ANOVA, statistical process control, etc.
  • practical applications of Machine Learning techniques such as Clustering, Logistic Regression, Random Forests, SVM or Neural Networks.
  • quantifying the costs, benefits, risks and chances for success before recommending a course of action.
  • In-depth technical and problem-solving skills and evidence of continuous learning in the analytics field

What the JD emphasized

  • extensive experience with statistical tools
  • extensive knowledge of principles, technologies and algorithms of machine learning
  • extensive knowledge in the application of Python programming
  • extensive knowledge of data management systems
  • extensive capabilities in SQL, data modeling, and understanding of data processing including ETL / ELT
  • Working knowledge of tools, methods, and techniques of requirement analysis

Other signals

  • GenAI initiatives
  • applied analytics
  • machine learning techniques
  • statistical models