Site Reliability Engineer - Video Infrastructure

ByteDance ByteDance · Big Tech · Seattle, WA · R&D

Site Reliability Engineer for ByteDance's Video Cloud Infra team, focusing on building and managing global infrastructure for multimedia transport, storage, and processing. Responsibilities include system management, automation, and ensuring reliability and efficiency of large-scale distributed systems.

What you'd actually do

  1. Build global infrastructure for multi-media transport, storage and process, to serve billions of users all over the world.
  2. Engage in global production system management such as monitoring, emergency response, capacity planning and optimization.
  3. Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global infrastructure.
  4. Engage in and improve the whole service lifecycle, from inception and design, through deployment, operation and refinement.
  5. Scale up systems sustainably through mechanisms like automation, and initiate changes that improve system reliability and processing speed.

Skills

Required

  • Bachelor's degree in Computer Science or a related technical background involving software/system engineering, or equivalent working experience.
  • Extensive knowledge of SRE responsibilities, such as monitoring, incident handling, capacity management and disaster recovery.
  • Extensive knowledge of networking, operation system, database system and container technology.
  • Good understanding of every aspect of microservice architecture, and hands on experience in troubleshooting in large scale distributed systems.

Nice to have

  • Good programming experience with at least one of the following languages: C, C++, Java, Python, or Go.
  • Hands on experience in common open-source systems such as Linux, MySQL, MongoDB, Redis and ELK and experience in building solutions with AWS,Google Cloud, Azures and other cloud services is a plus.
  • Passionate, self-motivated, strong ownership and good teamwork skills.