Sr. Site Reliability Engineer

Visa Visa · Fintech · Bengaluru, India, IN

This role focuses on supporting the deployment and configuration of monitoring and logging tools, automating operational tasks, and maintaining observability tools. It involves managing cloud infrastructure, CI/CD pipelines, and containerization technologies like Docker and Kubernetes. The engineer will also participate in troubleshooting and root cause analysis for production incidents and contribute to documentation. While the role mentions using Generative AI tools, its core responsibilities are in SRE and infrastructure management.

What you'd actually do

  1. Support deployment and configuration of monitoring and logging tools.
  2. Automate routine operational tasks to improve efficiency and support system integration.
  3. Assist with maintenance and management of observability tools (Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, CloudWatch).
  4. Implement and maintain monitoring solutions in development, staging, and production environments.
  5. Contribute to setup and maintenance of CI/CD pipelines for automated build, test, and deployment.

Skills

Required

  • Experience in supporting deployment and configuration of monitoring and logging tools.
  • Experience in automating routine operational tasks for system integration.
  • Experience with observability tools such as Splunk, ClickHouse, Grafana, Prometheus, OpenTelemetry, Fluent Bit, ElasticSearch, OpenSearch, and CloudWatch.
  • Experience in implementing and maintaining monitoring solutions across environments.
  • Experience in setting up and maintaining CI/CD pipelines for automated processes.
  • Experience in managing cloud infrastructure (AWS, GCP) for availability and security.
  • Experience with infrastructure as code tools (Terraform, Ansible, CloudFormation).
  • Experience in monitoring system performance and escalating issues.
  • Experience with containerization technologies (Docker, Kubernetes).
  • Experience in troubleshooting and root cause analysis for production incidents.

Nice to have

  • Experience in creating and updating documentation for infrastructure and operational procedures.
  • Experience in providing first-level support for infrastructure and deployment issues.
  • Experience in automating repetitive tasks and suggesting workflow improvements.
  • Experience in learning and applying DevOps and SRE best practices.
  • Experience in supporting implementation and management of containerization technologies.