Site Reliability Engineer - Video Infrastructure

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

Site Reliability Engineer for ByteDance's Video Cloud Infra team, focusing on building and managing global infrastructure for multimedia transport, storage, and processing. Responsibilities include system reliability, monitoring, incident response, capacity planning, and automation to optimize performance and reduce costs for a platform serving billions of users.

What you'd actually do

  1. Build global infrastructure for multi-media transport, storage and process, to serve billions of users all over the world.
  2. Engage in global production system management such as monitoring, emergency response, capacity planning and optimization.
  3. Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global infrastructure.
  4. Engage in and improve the whole service lifecycle, from inception and design, through deployment, operation and refinement.
  5. Scale up systems sustainably through mechanisms like automation, and initiate changes that improve system reliability and processing speed.

Skills

Required

  • SRE responsibilities
  • monitoring
  • incident handling
  • capacity management
  • disaster recovery
  • networking
  • operation system
  • database system
  • container technology
  • microservice architecture
  • large scale distributed systems

Nice to have

  • C++
  • Java
  • Python
  • Go
  • Linux
  • MySQL
  • MongoDB
  • Redis
  • ELK
  • AWS
  • Google Cloud
  • Azure