What you'd actually do

Design, implement, and operate cloud infrastructure supporting scalable, highly available AI‑enabled and Agentic platforms.

Apply Infrastructure as Code (IaC) practices (e.g., Terraform, Packer, Ansible) to provision and manage cloud resources consistently and securely.

Monitor platform health using metrics, logs, dashboards, and alerts, applying critical thinking to distinguish infrastructure, application, and AI‑driven failures.

Lead troubleshooting and resolution of complex cloud and platform issues, including distributed system and integration failures.

Develop and maintain automation and tooling (primarily Python and shell scripting) to improve reliability, diagnostics, and operational efficiency.

Skills

Required

Cloud Infrastructure Engineering (AWS, Azure, or GCP)
Infrastructure as Code (Terraform, Packer, Ansible, or equivalent)
Python Programming for automation and operational tooling
Monitoring, Logging, and Alerting Systems
CI/CD Tools and Release Automation
Security Fundamentals (IAM, secrets management, encryption, network security)
Foundational AI literacy

Nice to have

Experience supporting AI, ML, or Agentic platforms in production
Familiarity with data and streaming platforms (e.g., S3, SQS, Kafka‑like systems)
Understanding of networking and protocol standards
Exposure to hardware security modules (HSMs) or advanced key management solutions
Experience operating large‑scale distributed systems
Cloud cost optimization and performance tuning experience

What the JD emphasized

intelligent agents end to end

agent logic, workflows, integrations, and decision frameworks

deploy those agents into production and own their ongoing behavior, reliability, and impact

monitor agent performance validates AI driven actions with human judgment

owning both construction and operations

AI‑enabled and Agentic platforms

AI‑driven failures

agent lifecycles, orchestration patterns, and non‑deterministic execution behaviors

AI‑assisted tools

Other signals

Designing, building, and operating intelligent agents end to end

Develop agent logic, workflows, integrations, and decision frameworks

Deploy those agents into production and own their ongoing behavior, reliability, and impact

Monitor agent performance validates AI driven actions with human judgment

Own both construction and operations

Make your mark at Comcast -- a Fortune 30 global media and technology company. From the connectivity and platforms we provide, to the content and experiences we create, we reach hundreds of millions of customers, viewers, and guests worldwide. Become part of our award-winning technology team that turns big ideas into cutting-edge products, platforms, and solutions that our customers love. We create space to innovate, and we recognize, reward, and invest in your ideas, while ensuring you can proudly bring your authentic self to the workplace. Join us. You’ll do the best work of your career right here at Comcast. (In most cases, Comcast prefers to have employees on-site collaborating unless the team has been designated as virtual due to the nature of their work. If a position is listed with both office locations and virtual offerings, Comcast may be willing to consider candidates who live greater than 100 miles from the office for the remote option.)

Job Summary

The AI Agentic Team will be responsible for designing, building, and operating intelligent agents end to end. They develop agent logic, workflows, integrations, and decision frameworks, then deploy those agents into production and own their ongoing behavior, reliability, and impact. The team monitors agent performance validates AI driven actions with human judgment, and iterates on designs as systems encounter real world complexity. By owning both construction and operations, they ensure agentic platforms are not just innovative, but reliable, secure, and scalable in production.

Job Description

What You’ll Do:

Design, implement, and operate cloud infrastructure supporting scalable, highly available AI‑enabled and Agentic platforms.
Apply Infrastructure as Code (IaC) practices (e.g., Terraform, Packer, Ansible) to provision and manage cloud resources consistently and securely.
Monitor platform health using metrics, logs, dashboards, and alerts, applying critical thinking to distinguish infrastructure, application, and AI‑driven failures.
Lead troubleshooting and resolution of complex cloud and platform issues, including distributed system and integration failures.
Develop and maintain automation and tooling (primarily Python and shell scripting) to improve reliability, diagnostics, and operational efficiency.
Implement and evolve observability solutions (e.g., Prometheus, Splunk, Grafana, CloudWatch) to improve system transparency and alert quality.
Collaborate with application, data, and AI teams to ensure platforms are operable, observable, and scalable prior to and after release.
Support CI/CD pipelines and release workflows, enabling safe and repeatable deployments across environments.
Ensure secure cloud operations, including secrets management, access controls, encryption, and secure networking practices.
Apply foundational AI literacy to understand agent lifecycles, orchestration patterns, and non‑deterministic execution behaviors that impact operations.
Use AI‑assisted tools to enhance investigation, documentation, and optimization, while validating outputs with sound engineering judgment.
Contribute to and maintain runbooks, platform documentation, and operational standards.
Participate in on‑call rotations and act as an escalation point during production incidents.
Perform other duties and responsibilities as assigned.

**What You’ll Need: **

Cloud Infrastructure Engineering (AWS, Azure, or GCP)
Infrastructure as Code (Terraform, Packer, Ansible, or equivalent)
Python Programming for automation and operational tooling
Monitoring, Logging, and Alerting Systems
CI/CD Tools and Release Automation
Security Fundamentals (IAM, secrets management, encryption, network security)

Preferred / Nice‑to‑Have Skills

Experience supporting AI, ML, or Agentic platforms in production
Familiarity with data and streaming platforms (e.g., S3, SQS, Kafka‑like systems)
Understanding of networking and protocol standards
Exposure to hardware security modules (HSMs) or advanced key management solutions
Experience operating large‑scale distributed systems
Cloud cost optimization and performance tuning experience

Disclaimer: This information has been designed to indicate the general nature and level of work performed by employees in this role. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities and qualifications.

Skills

Adaptability, AI Adoption, AI Ops, Cloud Infrastructure, Critical Thinking, Infrastructure As Code (IaC), Scripting

We believe that benefits should connect you to the support you need when it matters most, and should help you care for those who matter most. That's why we provide an array of options, expert guidance and always-on tools that are personalized to meet the needs of your reality—to help support you physically, financially and emotionally through the big milestones and in your everyday life.

Please visit the benefits summary on our careers site for more details.

Education

Bachelor's Degree

While possessing the stated degree is preferred, Comcast also may consider applicants who hold some combination of coursework and experience, or who have extensive related professional experience.

Certifications (if applicable)

Relevant Work Experience

7-10 Years

Comcast is an equal opportunity workplace. We will consider all qualified applicants for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran status, genetic information, or any other basis protected by applicable law.