Lead System Engineer (ai Automation Engineer Sre Focus)

AT&T AT&T · Telecom · USA:TX:Plano +3

Lead System Engineer focused on AI Automation and Site Reliability Engineering (SRE) for mission-critical enterprise platforms (CRM, Supply Chain, ERP). The role involves designing and implementing AI-driven solutions using Generative AI, LLMs, and Agentic AI for incident triage, root cause analysis, remediation, prevention, and autonomous operations. It also includes AIOps techniques, LLM-enabled runbooks, and building AI-augmented observability solutions to improve system resilience and operational efficiency.

What you'd actually do

  1. Architect and deliver AI-powered automation solutions for production operations, including intelligent incident triage, root cause analysis, remediation, and prevention.
  2. Design Agentic AI workflows that autonomously monitor systems, analyze anomalies, trigger corrective actions, and orchestrate recovery across ERP, supply chain, and integration layers.
  3. Apply AIOps techniques to correlate metrics, logs, events, and traces for predictive alerting, noise reduction, and proactive reliability improvements.
  4. Develop LLM-enabled runbooks and intelligent assistants to guide operational decision-making, accelerate incident response, and upskill operations teams.
  5. Own platform stability, uptime, and performance across Oracle EBS/ERP, Oracle Fusion Cloud, and supply chain execution systems.

Skills

Required

  • Python
  • Shell scripting
  • SQL/PLSQL
  • Generative AI
  • LLMs
  • Agentic AI frameworks
  • AIOps
  • Site Reliability Engineering (SRE)
  • Enterprise application engineering
  • Production operations
  • Automation

Nice to have

  • Oracle EBS
  • Oracle Fusion Cloud
  • Dynatrace
  • AppDynamics
  • Splunk
  • ELK
  • Grafana
  • Docker
  • Kubernetes
  • Azure
  • O9
  • Blue Yonder/JDA
  • RELEX
  • Oracle SOA/OIC
  • MuleSoft
  • Kafka/JMS
  • EDI

What the JD emphasized

  • AI-driven automation
  • Agentic AI workflows
  • LLM-enabled runbooks
  • AI-augmented observability solutions
  • automation-first mindset
  • Generative AI, LLMs, or Agentic AI frameworks

Other signals

  • AI-driven automation
  • AIOps
  • Agentic AI workflows
  • LLM-enabled runbooks
  • predictive alerting
  • autonomous dashboards