Principal Member of Technical Staff

Oracle Oracle · Enterprise · BENGALURU, KARNATAKA, India

Seeking an Engineer to design, build, and scale intelligent operational platforms using AIOps, ML, LLMs, observability, and automation for predictive and autonomous operations. Responsibilities include architecting AIOps capabilities like incident routing, anomaly detection, operational copilots, ChatOps, and automated remediation, integrating intelligence into operational workflows.

What you'd actually do

  1. Design and build AIOps models using LLMs or classical ML for anomaly detection, correlation, root-cause identification, and intelligent event clustering.
  2. Develop operational copilots and chatbots capable of responding to incidents, surfacing insights, and driving automation through natural language.
  3. Build knowledge-grounding systems for operational copilots using runbooks, incident data, historical patterns, service maps, and topology.
  4. Build automated workflows for incident triage, diagnostics, collaboration, and remediation.
  5. Integrate AIOps models with observability platforms handling logs, metrics, traces, events, and topology data.

Skills

Required

  • System Design
  • Platform & reliability engineering
  • ML engineering
  • Data engineering
  • AIOps
  • Python
  • Java
  • PyTorch
  • TensorFlow
  • Modern LLM frameworks
  • Automation workflows
  • StackStorm
  • Rundeck
  • Airflow
  • Jenkins
  • Cloud-native orchestration platforms
  • Observability data (logs, metrics, traces)
  • Observability platforms (Datadog, Splunk, Prometheus, Grafana, ELK)
  • RAG pipelines
  • Embeddings
  • Intent models
  • Operational chatbots
  • Streaming or event-driven systems (Kafka, Kinesis, Pub/Sub)
  • Cloud-native systems
  • Kubernetes
  • Microservices
  • Modern deployment patterns
  • Claude Code
  • Codex
  • GitHub Copilot
  • Context engineering
  • Agentic harness frameworks
  • MCP server

Nice to have

  • Translate operational challenges into ML-based or automation-based solutions
  • Collaborate effectively across SRE, platform, service management, and engineering teams

What the JD emphasized

  • highly technical, hands-on role
  • strong depth in applied ML/LLMs
  • strong hands-on experience building ML or LLM-based systems
  • Deep understanding of observability data
  • Experience designing and deploying RAG pipelines, embeddings, intent models, or operational chatbots
  • Strong experience architecting streaming or event-driven systems
  • Excellent problem-solving skills
  • Hands-on experience with at least one of the following tools: Claude Code, Codex, GitHub Copilot
  • Good understanding of context engineering
  • Understanding of agentic harness frameworks
  • Experience building at least one MCP server

Other signals

  • AIOps
  • LLMs
  • Observability
  • Automation
  • RAG
  • Agentic workflows