Site Reliability Engineer

PayPal PayPal · Fintech · Bengaluru, KA, IN +2 · Site Reliability Engineering

Site Reliability Engineer at PayPal, focusing on production support for Credit Engineering, including Revolving Credit, Closed-Ended, and Merchant Lending products. Responsibilities include monitoring, incident response, automation, CI/CD, and system reliability.

What you'd actually do

  1. Actively monitor and analyze system metrics to ensure the availability, performance, and reliability of digital platforms and applications.
  2. Diagnose and resolve complex system issues, perform root cause analysis, and implement long-term fixes to prevent recurrence.
  3. Create and maintain automation scripts, tools, and processes to streamline operations, reduce manual effort, and enhance reliability.
  4. Configure and improve monitoring and alerting tools to provide actionable insights into system health and performance.
  5. Analyze system usage trends and forecast future resource requirements to ensure scalability and prevent capacity-related issues.

Skills

Required

  • 1+ years relevant experience and a Bachelor’s degree OR Any equivalent combination of education and experience.

Nice to have

  • 3+ years of experience in Production Engineering, Site Reliability Engineering, or similar roles
  • Strong problem-solving skills with the ability to debug complex, distributed, multi-tier applications
  • Proven experience leading and driving resolution of high-severity production incidents, including coordinating across teams
  • Demonstrated ability to identify recurring issues and drive long-term fixes and operational improvements
  • Strong understanding of microservices architecture and the software development lifecycle (SDLC); proficiency in at least one programming language (e.g., Java, Python) to debug issues and collaborate with development teams
  • Hands-on experience with monitoring and alerting tools (e.g., Splunk, Datadog, Nagios, Kibana)
  • Solid understanding of databases, including SQL and stored procedures (BigQuery, Oracle, PostgreSQL)
  • Experience working in cloud environments (AWS preferred)
  • Proficiency in Unix/Linux systems and shell scripting
  • Experience with batch job schedulers or workflow orchestration tools (e.g., Control-M, Airflow, UC4)
  • Familiarity with incident management and collaboration tools (JIRA, Confluence, ServiceNow)
  • Strong verbal and written communication skills, with the ability to influence and collaborate across cross-functional teams
  • Experience building automation to reduce operational toil (e.g., scripting, tooling, runbook automation)
  • Experience mentoring junior engineers or leading incident reviews/postmortems