Staff Cyber Site Reliability Engineer (sre)

GEICO GEICO · Insurance · Bethesda, MD +3

This role is for a Staff Cyber Site Reliability Engineer (SRE) focused on building reliable, observable, and scalable systems at the intersection of security and infrastructure. The primary focus is on engineering and automation to improve the reliability, performance, and operability of GEICO's security platforms and tooling. While AI/ML and LLMs are mentioned as a differentiator and an area to explore, the core responsibilities revolve around SRE practices, coding, automation, observability, and incident response for security systems.

What you'd actually do

  1. Define and drive reliability standards for cybersecurity platforms — establishing SLIs, SLOs, and error budgets; identifying systemic weaknesses; and engineering solutions that improve uptime, latency, and fault tolerance.
  2. Develop production-quality software in Python (required) and Golang (preferred) to automate operational workflows, build internal tooling, eliminate toil, and improve the day-to-day velocity of security engineering teams.
  3. Work closely with software engineers and infrastructure teams to review system designs for reliability, provide feedback on deployability and operability, and ensure that what gets built can be confidently operated and maintained in production.
  4. Instrument security platforms and pipelines with meaningful metrics, logs, and traces; build dashboards and alerting that give the team real operational visibility using tools like Grafana, Prometheus, and similar observability stacks.
  5. Be a first-responder for production issues affecting security systems; drive structured incident response, coordinate resolution, and produce blameless post-mortems with actionable follow-through to prevent recurrence.

Skills

Required

  • Python development
  • SRE / Platform Engineering Foundation
  • Object-Oriented Design
  • Observability & Monitoring
  • Incident Response
  • CI/CD & Infrastructure as Code

Nice to have

  • Golang Proficiency
  • AI/ML development
  • large language models
  • generative AI

What the JD emphasized

  • Python expertise is required
  • Python Expertise (Required): Demonstrated production-level Python development — used for automation, tooling, and operational software. This is a non-negotiable requirement for consideration.