Site Reliability Engineer 3 - Chicago or Denver - Onsite

Comcast Comcast · Media · Chicago, CO - Englewood, IL

Site Reliability Engineer role focused on ensuring the reliability, scalability, and performance of FreeWheel's ad platform systems. Responsibilities include managing infrastructure, optimizing system reliability, automating operations, and resolving technical issues. Requires experience in SRE/DevOps, cloud platforms, infrastructure as code, automation tools, and programming languages like Python or Go. Key tasks involve system monitoring, automation development, performance optimization, incident response, capacity planning, and cross-team collaboration.

What you'd actually do

  1. Design and implement monitoring and alerting systems to ensure the stability, reliability, and performance of data platforms. Join in on-call shift to quickly respond to and resolve issues.
  2. Develop and maintain automation tools and scripts for deployment, monitoring, backup and disaster recovery.
  3. Analyze and optimize the performance of data storage, query performance, and data flows to ensure efficient processing of large-scale datasets, reduce latency, an improve processing speed.
  4. Respond quickly to platform failures, perform troubleshooting, and coordinate cross-team efforts to resolve issues and ensure high availability and reliability.
  5. Work with engineering teams to analyze and forecast capacity requirements, ensuring the system can handle traffic growth and scale infrastructure accordingly. Support Freewheel powered Live events.

Skills

Required

  • 3+ years of experience as an SRE, DevOps or Operations Engineer
  • Experience with an automation tool or framework such as Ansible, Terraform, Kubernetes, Docker for automating system deployment
  • Proficient in at least one programming language, such as Python, Go, Java, or Scala
  • Familiar with using monitoring and log management tools such as Prometheus, Grafana, ELK Stack, or other similar tools
  • Excellent communication skills

Nice to have

  • Experience with cloud platforms (e.g. AWS, OCI, GCP, Azure)
  • Hands-on experience with Terraform and infrastructure as code principle