Site Reliability Engineer

Ford Ford · Auto · United States · Enterprise Technology

Seeking an experienced Site Reliability Engineer (SRE) to join Ford's Enterprise Technology team. The role focuses on developing, enhancing, and expanding a global monitoring and observability platform, ensuring the uptime, scalability, and maintainability of critical cloud services on Google Cloud Platform (GCP). Responsibilities include writing and deploying code in Go and Javascript, managing SRE monitoring backends with Golang, Postgres, and OpenTelemetry, developing tooling with Terraform, and collaborating with development teams to improve system reliability and performance.

What you'd actually do

  1. Write, configure, and deploy code in Go and Javascript that improves service reliability for existing or new systems; set standard for others with respect to code quality.
  2. Work within Google Cloud Platform (GCP) infrastructure, optimizing performance and cost, and scaling resources to meet demand.
  3. Implement and manage SRE monitoring application backends using Golang, Postgres, and OpenTelemetry. Develop tooling using Terraform and other IaC tools to ensure visibility and proactive issue detection across our platforms.
  4. Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks.
  5. Troubleshoot and resolve issues in our dev, test, and production environments.

Skills

Required

  • Golang
  • Javascript
  • Google Cloud Platform (GCP)
  • Kubernetes
  • OpenTelemetry
  • Terraform
  • IaC tools
  • Site Reliability Engineering
  • Software Development
  • Systems Engineering
  • Debugging
  • Troubleshooting
  • Performance Tuning
  • Disaster Recovery
  • Capacity Planning
  • Security Best Practices
  • PostgreSQL

Nice to have

  • Dynatrace

What the JD emphasized

  • 3+ years of experience as an SRE, Software Engineer, DevOps Engineer or similar role
  • Solid programming skills in Golang and scripting languages
  • Proficient with monitoring and observability tools, particularly OpenTelemetry, Dynatrace or other tools
  • Proficient with cloud services, with a strong preference for Kubernetes and Google Cloud Platform (GCP) experience