Manager, Incident Ops and Observability at F5

What you'd actually do

Lead the global Incident Response (IR) program, optimizing processes across detection, triage, containment, remediation, and post-incident analysis.

Hire, mentor and train global team members on incident response best practices and observability tooling.

Serve as a technical lead and head engineer for creation and management of monitoring tools and services to support F5 infrastructure and business systems.

Serve as the primary incident commander during major incidents, ensuring timely resolution, excellent communication, and stakeholder alignment.

Define and continuously refine incident response policies, procedures, and runbooks to ensure consistent and effective handling of incidents.

Skills

Required

Incident response
NOC/SOC/SRE
Monitoring
Observability
Cloud and hybrid environments
Problem Management
Change Management
Configuration Management
ITSM platform (e.g., ServiceNow)
Observability tools (e.g. Grafana, ThousandEyes, LogicMonitor, Pingdom, Zabbix)
AWS
Google Workspace
SaaS platforms
Leadership
Communication skills

Nice to have

Infrastructure, IT, or security organizations experience
Tableau, PowerBI, or other reporting/analytics platforms
SIEM, SOAR, and log analysis tools (e.g., Splunk, DataDog, Panther, Crowdstrike)
ITIL V4 and/or Six Sigma certifications

What the JD emphasized

10+ years managing incident response within NOC/SOC/SRE teams with a focus on monitoring and observability.

Proven track record of managing complex operational incidents in cloud and hybrid environments.

Experience driving continuous improvement and operational excellence in processes such as Problem Management, Change Management, and Configuration Management.

Experience working with and/or managing CMDB governance leveraging and ITSM platform (e.g., ServiceNow)

Experience integrating runbooks, operational processes, and metrics reporting into an ITSM platform (e.g., ServiceNow)

Experience with observability tools, especially tooling focused on synthetics, metrics, and infrastructure telemetry (e.g. Grafana, ThousandEyes, LogicMonitor, Pingdom, Zabbix)

At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.

Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.

Manager, ITSM and Observability

About F5

Position Summary

We are seeking a manager to help build our new Site Reliability Engineering and ITSM team to strengthen operational excellence across the Infrastructure & Security and F5 Digital organization. This role will play an important part in Digital’s incident management strategy and IT Service Management practices by building out the Reliability Operations Center and monitoring capabilities required to help Digital understand problems before our users do.

The ideal candidate will bring deep expertise in incident lifecycle management—from detection and triage to resolution and post-mortem—and will collaborate cross-functionally to drive continuous improvement in our security posture. This leader will operationalize a world-class incident management program while also defining and implementing the vision for observability across F5’s hybrid infrastructure and cloud environments. In addition, this role will be responsible for maturing related ITSM processes including Problem Management, Change Management, and Configuration Management. This role requires strong leadership, technical acumen, and the ability to operate under pressure while maintaining clear communication with stakeholders at all levels.

Key Responsibilities

Lead the global Incident Response (IR) program, optimizing processes across detection, triage, containment, remediation, and post-incident analysis.
Hire, mentor and train global team members on incident response best practices and observability tooling.
Serve as a technical lead and head engineer for creation and management of monitoring tools and services to support F5 infrastructure and business systems.
Serve as the primary incident commander during major incidents, ensuring timely resolution, excellent communication, and stakeholder alignment.
Define and continuously refine incident response policies, procedures, and runbooks to ensure consistent and effective handling of incidents.
Drive improvements in detection, escalation, and resolution through automation, tooling, and process enhancements.
Define and report KPIs for service reliability, incident response, and observability maturity to senior leadership.
Conduct root cause analyses and lead post-incident reviews to identify lessons learned and prevent recurrence.
Design and lead cross-functional tabletop exercises to strengthen organizational preparedness, communication, and response coordination during major incidents.
Maintain detailed incident records and metrics to support auditing, compliance, and continuous improvement.
Collaborate with ServiceNow teams and architects to manage incidents.
Establish and maintain on-call rotations with teams who own critical applications across the Digital organization.
Establish and lead Problem Management, Change Management, and Configuration Management functions to improve operational excellence across Digital/IT.

Qualifications

10+ years managing incident response within NOC/SOC/SRE teams with a focus on monitoring and observability.
Proven track record of managing complex operational incidents in cloud and hybrid environments.
Experience driving continuous improvement and operational excellence in processes such as Problem Management, Change Management, and Configuration Management.
Experience working with and/or managing CMDB governance leveraging and ITSM platform (e.g., ServiceNow)
Experience integrating runbooks, operational processes, and metrics reporting into an ITSM platform (e.g., ServiceNow)
Experience with observability tools, especially tooling focused on synthetics, metrics, and infrastructure telemetry (e.g. Grafana, ThousandEyes, LogicMonitor, Pingdom, Zabbix)
Excellent communication skills with the ability to convey technical information to both technical and non-technical audiences.
Ability to lead under pressure, prioritize effectively, and make decisions in high-stakes situations.
Familiarity with AWS, Google Workspace, and common SaaS platforms.
Bachelor’s degree in Computer Science, Cybersecurity, Information Systems, or related field (or equivalent experience).

Preferred Qualifications

Experience working in infrastructure, IT, or security organizations.
Familiarity with tools such as Tableau, PowerBI, or other reporting/analytics platforms.
Experience with SIEM, SOAR, and log analysis tools (e.g., Splunk, DataDog, Panther, Crowdstrike).
Comfortable navigating ambiguity, with a proactive approach to problem-solving.
Strong interest in scaling operations and driving impact in security-focused initiatives.
ITIL V4 and/or Six Sigma certifications

#Li-JB1 #remote

The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.

The annual base pay for this position is: $170,400.00 - $255,600.00

F5 maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, geographic locations, and market conditions, as well as to reflect F5’s differing products, industries, and lines of business. The pay range referenced is as of the time of the job posting and is subject to change.

_You may also be offered incentive compensation, bonus, restricted stock units, and benefits. More details about F5’s benefits can be found at the following link: _https://www.f5.com/company/careers/benefits_. F5 reserves the right to change or terminate any benefit plan without notice. _

Please note that F5 only contacts candidates through F5 email address (ending with @f5.com) or auto email notification from Workday (ending with f5.com or** @myworkday.com)****.**

Equal Employment Opportunity

It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical, or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination. F5 offers a variety of reasonable accommodations for candidates. Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting accommodations@f5.com.