Principal Software Engineer - Ad Tech & Distributed Systems - Freewheel

Comcast Comcast · Media · Chicago, IL

Principal Software Engineer focused on the reliability, performance, and operational excellence of large-scale distributed ad tech platforms. Responsibilities include designing, operating, and troubleshooting these systems, owning monitoring, incident response, change management, and capacity planning. The role involves leading complex issue resolution, automating workflows, and partnering with engineering teams to ensure production readiness and scalability.

What you'd actually do

  1. Own production reliability, availability, latency, and performance of large‑scale, mission‑critical systems
  2. Design, implement, and operate monitoring, alerting, and observability solutions to ensure system health and rapid detection of issues
  3. Lead incident response, root cause analysis, and post‑incident reviews to drive long‑term reliability improvements
  4. Support and ensure stable operations during high‑visibility, time‑sensitive live events and releases
  5. Drive automation initiatives to reduce operational toil, improve efficiency, and increase system resilience

Skills

Required

  • AWS
  • Python
  • Go-Lang
  • Scala
  • distributed systems
  • backend services
  • data processing platforms
  • data pipelines
  • large-scale system architectures
  • Linux systems
  • system internals
  • networking
  • production infrastructure
  • AWS cloud architecture
  • VPC
  • subnets
  • NACLs
  • security groups
  • EC2
  • S3
  • IAM
  • Route 53
  • Lambda
  • infrastructure-as-code
  • configuration management tools
  • CI/CD
  • SDLC tools
  • Docker
  • Kubernetes
  • Jenkins
  • Git
  • Ansible
  • Chef
  • Puppet
  • database technologies
  • SQL
  • performance tuning
  • operational data management
  • analytical and data-driven problem-solving skills
  • metrics
  • communication skills
  • attention to detail
  • adaptability
  • global, cross-functional team

Nice to have

  • AWS OpsWorks
  • C++ Programming Language
  • Python (Programming Language)
  • Systems Design

What the JD emphasized

  • 10+ years of professional experience in software development/engineering
  • 5+ years experience with AWS
  • Expert-level coding, debugging, and troubleshooting skills across complex, distributed production systems
  • Proven ability to lead and mentor engineers in automation, reliability engineering, and production problem-solving
  • Strong experience designing and operating server-side applications or services using Python, Go-Lang, or Scala
  • Experience developing, operating, and troubleshooting distributed systems and backend services
  • Deep knowledge of Linux systems, system internals, networking, and production infrastructure
  • Extensive experience with AWS cloud architecture and services including VPC, subnets, NACLs, security groups, EC2, S3, IAM, Route 53, Lambda, and related services
  • Mastery of CI/CD and SDLC tools (Docker, Kubernetes, Jenkins, Git, Ansible, Chef, and Puppet)
  • Strong understanding of database technologies, SQL, performance tuning, and operational data management