Principal Data Architect

Caterpillar Caterpillar · Industrial · Irving, TX

The Principal Digital Architect for Data is responsible for defining the data architecture that enables Physical AI and autonomous systems to learn, reason, and operate at scale. This role establishes the data domains, canonical data models, data products, and architectural standards that transform vast amounts of jobsite, machine, sensor, telemetry, simulation, and operational data into trusted, reusable assets for AI development and autonomy. The architect provides technical leadership for how data is organized, managed, and delivered across the enterprise, ensuring engineers and data scientists can efficiently develop, train, fine-tune, validate, and deploy next-generation AI models.

What you'd actually do

  1. Developing detailed architecture deliverables to solve business problems.
  2. Designing an application's technical infrastructure, such as specific databases, programming languages, utilities, and testing approaches.
  3. Leading the evaluation and deployment of new technologies to add or enhance existing digital technical capabilities.
  4. Participating in addressing business requirements for applications and collaborating with cross-functional teams to deliver digital solutions that meets business results.

Skills

Required

  • Data Architecture
  • Analytical Thinking
  • Platform Architecture
  • Requirements Analysis
  • Target Architecture

Nice to have

  • Data Architecture & Domain Modeling: Demonstrated ability to define enterprise data domains, canonical data models, data contracts, metadata strategies, and scalable architectures that support complex AI and autonomy ecosystems.
  • AI & Machine Learning Data Foundations: Deep understanding of the data requirements for AI model development, including training data, fine-tuning datasets, feature engineering, vector data, model evaluation data, synthetic data, and data pipelines that support the AI lifecycle.
  • Data Platforms & Data Products: Experience designing modern cloud and on-premise data platforms and data products architectures for massive data volumes and events, including streaming data, telemetry platforms, APIs, self-service data capabilities, and reusable data assets that accelerate engineer

What the JD emphasized

  • Data Architecture
  • Platform Architecture
  • Target Architecture
  • AI & Machine Learning Data Foundations
  • Data Platforms & Data Products

Other signals

  • data foundation for AI development
  • enabling AI models
  • autonomous systems
  • physical AI