What you'd actually do

Lead and manage day-to-day production support across application, platform, and data layers

Own end-to-end DataOps processes including data ingestion, transformation, pipelines, and data quality

Act as primary escalation point for P1/P2 incidents (application, platform, and data)

Define and implement monitoring and alerting strategies across applications and data pipelines

Lead a distributed team of support engineers, SREs, and DataOps engineers

Skills

Required

Bachelor’s degree in Engineering, Computer Science, Data Engineering, or related field
10–15+ years in IT operations, production support, SRE, or DataOps
5+ years of people leadership experience
Incident and problem management
Data pipeline operations (ETL/ELT, streaming, batch)
Cloud platforms (Azure/AWS)

Nice to have

DataOps
data platforms (e.g., Kafka, Spark, Data Lake)
ITSM tools (ServiceNow or equivalent)
Microservices and distributed systems
Data engineering and analytics pipelines
DevOps, AIOps, and automation

What the JD emphasized

day-to-day production support

SLA, SLO, and KPI metrics

rapid incident resolution

operational health, system performance, and availability

DataOps processes

availability, accuracy, and timeliness of manufacturing data pipelines

data-related incidents

data observability, lineage, and monitoring capabilities

data reliability (SLAs) and governance

P1/P2 incidents

root cause analysis (RCA)

post-incident review processes

MTTR through automation, predictive alerts, and proactive issue detection

monitoring and alerting strategies

end-to-end observability

system resilience, reliability, and automation

distributed team of support engineers, SREs, and DataOps engineers

performance, productivity, and cost optimisation

24x7 global support model

automation, AIOps, and DataOps practices

ITSM frameworks

incident trends, backlog management, and workflow efficiency

Engineering, Product, Data, and Plant stakeholders

new platform rollouts, plant onboarding, and hypercare phases

application, data, and architecture teams

SLA compliance

Incident volumes and trends

Data pipeline success rates and data quality metrics

MTTR / MTBF

System and data availability

operational and data excellence

Incident and problem management

Data pipeline operations (ETL/ELT, streaming, batch)

Cloud platforms (Azure/AWS)

DataOps

data platforms (e.g., Kafka, Spark, Data Lake)

ITSM tools (ServiceNow or equivalent)

Microservices and distributed systems

Data engineering and analytics pipelines

DevOps, AIOps, and automation

Career Area:

Technology, Digital and Data

Job Description:

**Your Work Shapes the World at Caterpillar Inc. **

When you join Caterpillar, you're joining a global team who cares not just about the work we do – but also about each other. We are the makers, problem solvers, and future world builders who are creating stronger, more sustainable communities. We don't just talk about progress and innovation here – we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it.

Your Impact Shapes the World at Caterpillar Inc

When you join Caterpillar, you're joining a global team who cares not just about the work we do – but also about each other. We are the makers, problem solvers and future world builders who are creating stronger, more sustainable communities. We don't just talk about progress and innovation here – we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it.

Job Summary

We are seeking a skilled Manager Digital Operations Support (Modern Manufacturing Digital Platform & Applications) to join MANUF & SUPPLY DIG PLAT(INDIA) -**CAT IT Division. **

The Operations & Data Support Manager is responsible for leading end-to-end application, platform, and data operations support for the Modern Manufacturing Digital Platform (MMDP). This role ensures high availability, reliability, performance, and data quality across manufacturing digital systems.

The role drives operational excellence by integrating application support, platform reliability, and DataOps practices, and partners closely with engineering, product, data, and business teams to deliver seamless plant, factory, and enterprise digital experiences.

The preference for this role is to be based out of Chennai - Brigade World Trade Centre

What you will do

This role leads 24x7 production support, platform operations, and DataOps functions, ensuring stable and scalable digital manufacturing solutions. It is accountable for incident, problem, and data pipeline operations, and plays a critical role in enabling real-time manufacturing insights, analytics, and decision-making.

Key Responsibilities

Operations, Support & Service Delivery

Lead and manage day-to-day production support across application, platform, and data layers
Ensure adherence to SLA, SLO, and KPI metrics for incidents, service requests, and data availability
Drive rapid incident resolution and minimize plant/business disruption
Own overall operational health, system performance, and availability of MMDP systems

Data Operations (DataOps) & Data Reliability

Own end-to-end DataOps processes including data ingestion, transformation, pipelines, and data quality
Ensure availability, accuracy, and timeliness of manufacturing data pipelines
Lead resolution of data-related incidents, including pipeline failures and data inconsistencies
Drive data observability, lineage, and monitoring capabilities
Collaborate with Data Engineering and Analytics teams to improve data reliability (SLAs) and governance

Incident & Problem Management

Act as primary escalation point for P1/P2 incidents (application, platform, and data)
Lead root cause analysis (RCA) and drive permanent fixes across systems and pipelines
Establish structured post-incident review processes
Reduce MTTR through automation, predictive alerts, and proactive issue detection

Service Reliability, Monitoring & Observability

Define and implement monitoring and alerting strategies across applications and data pipelines
Enable end-to-end observability (application + data + infrastructure)
Drive improvements in system resilience, reliability, and automation
Ensure proactive detection using tools, telemetry, and analytics

Team Leadership & Vendor Management

Lead a distributed team of support engineers, SREs, and DataOps engineers
Define clear roles, responsibilities, and escalation paths across support tiers
Manage vendor partners (TPL/SOW) and ensure performance, productivity, and cost optimisation
Establish and manage a 24x7 global support model

Continuous Improvement & Automation

Identify opportunities to improve efficiency through automation, AIOps, and DataOps practices
Develop and maintain runbooks, playbooks, and knowledge base
Standardise operational processes aligned with ITSM frameworks (e.g., ServiceNow)
Optimise incident trends, backlog management, and workflow efficiency

Cross-Functional Collaboration

Partner with Engineering, Product, Data, and Plant stakeholders for releases and support transitions
Support new platform rollouts, plant onboarding, and hypercare phases
Ensure strong alignment across application, data, and architecture teams

Metrics, Reporting & Governance

Track, analyse, and report key metrics:
SLA compliance
Incident volumes and trends

Data pipeline success rates and data quality metrics

MTTR / MTBF
System and data availability
Provide insights and recommendations to improve operational and data excellence

** What you will have **

Bachelor’s degree in Engineering, Computer Science, Data Engineering, or related field
10–15+ years in IT operations, production support, SRE, or DataOps
5+ years of people leadership experience
Strong experience in:
Incident and problem management
Data pipeline operations (ETL/ELT, streaming, batch)
Cloud platforms (Azure/AWS)

Preferred Skills

Strong expertise in DataOps, data platforms (e.g., Kafka, Spark, Data Lake)
Experience with ITSM tools (ServiceNow or equivalent)
Solid understanding of:
Microservices and distributed systems
Data engineering and analytics pipelines
Experience in DevOps, AIOps, and automation frameworks
Proven experience managing global 24x7 support models
Strong stakeholder engagement and communication skills

Leadership Expectations

Drive end-to-end operational and data accountability
Foster a culture of proactive ownership, reliability engineering, and automation
Enable data-driven decision making through reliable operations
Promote continuous improvement across application and data ecosystems
Additional information:This position requires candidate to work a 5-day -a -week schedule in the office

Skills desired:

Decision Making and Critical Thinking: Knowledge of the decision-making process and associated tools and techniques; ability to accurately analyze situations and reach productive decisions based on informed judgment. Level Extensive Experience: • Differentiates assumptions, perspectives, and historical frameworks. • Evaluates past decisions for insights to improve decision-making process. • Assesses and validates decision options and points and predicts their potential impact. • Advises others in analyzing and synthesizing relevant data and assessing alternatives. • Uses effective decision-making approaches such as consultative, command, or consensus. • Ensures that assumptions and received wisdom are objectively analyzed in decisions.

Effective Communications: Understanding of effective communication concepts, tools and techniques; ability to effectively transmit, receive, and accurately interpret ideas, information, and needs through the application of appropriate communication behaviors. Level Extensive Experience: • Reviews others' writing or presentations and provides feedback and coaching. • Adapts documents and presentations for the intended audience. • Demonstrates both empathy and assertiveness when communicating a need or defending a position. • Communicates well downward, upward, and outward. • Employs appropriate methods of persuasion when soliciting agreement. • Maintains focus on the topic at hand.

Problem Solving: Knowledge of approaches, tools, techniques for recognizing, anticipating, and resolving organizational, operational or process problems; ability to apply knowledge of problem solving appropriately to diverse situations. Level Extensive Experience: • Ensures capture of lessons to be learned from a problem-solving effort. • Organizes potential problem solvers and leads problem resolution efforts. • Uses varying problem-solving approaches and techniques as appropriate. • Contributes to standard practices for problem-solving approaches, tools, and processes. • Analyzes and synthesizes information and devises alternative resolution strategies. • Develops successful resolutions to critical or wide-impact problems.

Organizational Leadership: Knowledge of leadership concepts and ability to use strategies and skills to enlist others in setting, embracing and achieving objectives. Level Extensive Experience: • Employs various group decision-making methods depending on the situation. • Promotes efficacy through monitoring, coaching & motivating subordinates, intervention, etc. • Translates vision into specific functional or departmental initiatives. • Uses emotional contagion to affect the mood of group members, tone of group and group processes. • Uses a normative decision model (with leadership styles and situational variables) to select style. • Initiates structure: role clarification, setting standards, holding subordinates accountable, etc.

Customer Interaction: Knowledge of the principles and techniques of communicating with a customer; ability to utilize tools and techniques for customer interaction. Level Extensive Experience: • Provides customer support on difficult problems. • Teaches techniques for transitioning from problem solving to sales opportunities. • Analyzes the characteristics of threats and selects the proper plan of action for handling them. • Evaluates techniques for face-to-face, telephone and web-based interactions. • Advises on methods for identifying leads and procedures for turning them over. • Implements tactics to de-escalate problem situations immediately.

Customer Support Policies, Standards and Procedures: Knowledge of the organization's customer support policies, standards and procedures; ability to guide customers on all company interactions. Level Extensive Experience: • Collaborates with other functions on establishing and documenting joint standards. • Develops quality assurance and monitoring mechanisms. • Advises on the development of customer support policies and practices. • Implements cross-functional standards and procedures. • Evaluates the impact of standards and policies across functional specialties. • Analyzes existing and evolving procedures for their efficiency and effectiveness.

Software Release Management: Knowledge of strategies, practices and tools for managing versions and distribution of software products and enhancements; ability to evaluate and improve release management practices and tools. Level Extensive Experience: • Participates in the development and testing of back-out procedures and recovery strategies. • Oversees all aspects of release management for the entire installed customer base. • Develops and disseminates communications about new release training. • Drafts communications about product release delays. • Develops distribution dates based on development, testing (QA), validation, packaging activities. • Conducts impact analysis for major releases or critical applications; assesses benefits and risks.

Software Reliability Management: Knowledge of software reliability management; ability to develop and use principles, methodologies and metrics that increase software product performance and reliability. Level Extensive Experience: • Discusses variables used in standard reliability metrics and their impact on a given metric. • Identifies methods to improve product performance and recommends design changes. • Gathers information on product failures to be used for product redesign or enhancement. • Demonstrates experience with reliability engineering practices for a range of products. • Uses probability and statistical measures to assess products, estimate and monitor reliability. • Participates in the definition of success criteria for products and tests.

What you will get:

Work Life Harmony
Earned and medical leave.
Relocation assistance

Holistic Development

Personal and professional development through Caterpillar ‘s employee resource groups across the globe
Career developments opportunities with global prospects

Health and Wellness

Medical coverage -Medical, life and personal accident coverage
Employee mental wellness assistance program

Financial Wellness

Employee investment plan
Pay for performance -Annual incentive Bonus plan.

Additional Information:

Caterpillar is not currently hiring individuals for this position who now or in the future require sponsorship for employment visa status; however, as a global company, Caterpillar offers many job opportunities outside of the U.S. which can be found through our employment website at www.caterpillar.com/careers

This position requires working onsite five days a week.

Visa Sponsorship is not available for this position.

Posting Dates:

June 17, 2026 - June 30, 2026

Caterpillar is an Equal Opportunity Employer. Qualified applicants of any age are encouraged to apply

Not ready to apply? Join our Talent Community.