Lead Member of Technical Staff - AI Systems Engineer

AT&T AT&T · Telecom · USA:TX:Plano +1

This role focuses on building AI-powered operational systems for AT&T's broadband and Wi-Fi ecosystem, using Generative AI and LLMs to improve reliability, performance, and customer experience. The engineer will develop intelligent systems for network analysis, troubleshooting, root-cause analysis, and autonomous operations.

What you'd actually do

  1. Build AI systems that directly operate and optimize production networks at massive scale.
  2. Design, develop, and deploy AI-driven operational platforms leveraging LLMs, automation frameworks, and large-scale data pipelines.
  3. Build intelligent solutions that detect anomalies, identify root causes, and automate issue triage across gateway and broadband environments.
  4. Develop LLM-based applications that analyze logs, telemetry, and operational data to generate actionable insights.
  5. Create Retrieval-Augmented Generation (RAG), prompt-engineering, and agentic AI workflows that support engineering and operational teams.

Skills

Required

  • Python
  • modern software development practices
  • designing and operating production-grade distributed systems, services, or data pipelines
  • implementing Generative AI solutions
  • LLMs
  • RAG architectures
  • prompt engineering
  • AI workflows
  • system reliability
  • observability
  • debugging
  • operational excellence

Nice to have

  • real-time analytics and telemetry platforms
  • Apache Pinot
  • applying AI/ML solutions to operational challenges
  • AIOps
  • incident management
  • log analysis
  • cloud-native architectures
  • distributed computing environments
  • integrating AI into software engineering workflows
  • CI/CD
  • automated testing
  • networking technologies
  • Wi-Fi systems
  • large-scale connected device ecosystems
  • solving ambiguous operational problems through scalable technical solutions

What the JD emphasized

  • AI Systems Engineer
  • Generative AI
  • LLM technologies
  • intelligent systems
  • autonomous network operations
  • AI-driven operational platforms
  • LLM-based applications
  • agentic AI workflows
  • production-grade distributed systems
  • Generative AI solutions
  • LLMs
  • RAG architectures
  • prompt engineering
  • AI workflows

Other signals

  • Build AI systems that directly operate and optimize production networks at massive scale.
  • Develop LLM-based applications that analyze logs, telemetry, and operational data to generate actionable insights.
  • Create Retrieval-Augmented Generation (RAG), prompt-engineering, and agentic AI workflows that support engineering and operational teams.