Agentic AI / AI Ops Engineer – Platform Engineering

Caterpillar Caterpillar · Industrial · Chennai, Tamil Nadu

This role focuses on building production-grade Agentic AI and AI Ops solutions, including autonomous workflows and LLM-based applications, to improve platform operations and developer experience. It involves designing, implementing, and deploying these systems, integrating them with cloud-native infrastructure, and managing the AI lifecycle.

What you'd actually do

  1. Design and implement agentic AI systems that plan, reason, and execute multi-step workflows across platform and portal ecosystems
  2. Build and deploy AI-driven automation for operations (incident detection, triage, remediation, and monitoring)
  3. Develop and productionize LLM-based applications and intelligent workflows integrated with APIs, tools, and enterprise systems
  4. Integrate AI solutions with platform infrastructure (Kubernetes, CI/CD, observability, telemetry pipelines)
  5. Establish scalable patterns for AI lifecycle (design → deployment → monitoring → optimization)

Skills

Required

  • Experience building and deploying AI/ML or Generative AI solutions in production
  • Strong software engineering fundamentals (system design, CI/CD, testing, monitoring)
  • Experience with cloud-native and distributed systems

Nice to have

  • Experience building agentic or LLM-based systems (e.g., multi-step workflows, tool integration, memory/context handling)
  • Strong programming skills (Python or similar)
  • Kubernetes and modern platform infrastructure
  • Observability (logs, metrics, traces) and telemetry systems
  • Workflow/orchestration frameworks (e.g., LangGraph, AutoGen, similar)
  • Understanding of AI Ops, SRE practices, and reliability engineering principles
  • Experience productionizing AI systems with focus on scalability, performance, and reliability

What the JD emphasized

  • Agentic AI
  • AI Ops
  • production-grade AI systems
  • autonomous workflows
  • LLM-based applications
  • AI lifecycle

Other signals

  • production-grade AI systems
  • autonomous workflows
  • AI lifecycle management