Site Reliability Engineer

PayPal PayPal · Fintech · Scottsdale, AZ +1 · Site Reliability Engineering

Site Reliability Engineer at PayPal responsible for ensuring the availability, performance, and reliability of digital platforms and applications through monitoring, troubleshooting, automation, and system design. Requires experience in production support, incident management, system monitoring, and various administration and scripting skills.

What you'd actually do

  1. Monitor and analyze system metrics to ensure optimal availability, performance, and reliability of digital platforms and applications, including continuous assessment of system health indicators, performance benchmarks, and service-level objectives.
  2. Diagnose and resolve complex system issues through systematic troubleshooting methodologies, performing comprehensive root cause analysis of system failures, and implementing sustainable long-term solutions to prevent issue recurrence.
  3. Create and maintain automation scripts, tools, and operational processes designed to streamline operations, reduce manual intervention requirements, and enhance system reliability, while configuring, maintaining, and continuously improving monitoring and alerting systems to provide actionable insights into system health and performance metrics.
  4. Analyze system usage patterns and trends to accurately forecast future resource requirements, ensuring adequate system scalability, and implementing preventive measures to avoid capacity-related performance degradation or service interruptions.
  5. Design and implement highly reliable, fault-tolerant systems that incorporate industry best practices for high availability and disaster recovery, ensure reliable software releases, and perform systematic failure simulations, to proactively identify system weaknesses and improve overall system robustness.

Skills

Required

  • Production Support
  • Incident Management
  • System and Application Monitoring (DataDog, Splunk)
  • Batch Systems and Tools (Control-M)
  • System Administration
  • Configuration Management
  • Infrastructure Management
  • UNIX/Linux Administration
  • Technical Documentation (Confluence)
  • Programming/Scripting (Shell, Perl, Python)
  • Database Technologies (Oracle, PostgreSQL)
  • Cloud Technologies (AWS, GoogleCloud)
  • Containerization Tools and Technologies (Kubernetes, Docker)