Site Reliability Engineer - Data

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

Site Reliability Engineer focused on data infrastructure and applied machine learning (AML) at ByteDance. Responsibilities include managing the full service lifecycle, designing software platforms and monitoring, developing cloud-managed data infrastructure components (Kubernetes, Redis, MySQL, Flink), and establishing scaling mechanisms. Requires a BS in CS, 3+ years of programming experience, and familiarity with distributed systems.

What you'd actually do

  1. Participate in and enhance the complete service lifecycle, from inception and design, through development, capacity planning, launch reviews, deployment, operation, and refinement.
  2. Design and implement software platforms and monitoring frameworks to govern service-oriented architecture (SOA) efficiently, automatically, and intelligently.
  3. Develop and manage components of cloud-managed data infrastructure, encompassing technologies such as Kubernetes, Redis, MySQL, Flink, and more.
  4. Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity.
  5. Provide sustainable user support, manage incident responses, and conduct blameless postmortems as part of our ongoing efforts to improve our systems.

Skills

Required

  • Bachelor's degree in Computer Science or a related technical field
  • 3+ years of experience programming in C, C++, Java, Python, Go, or Rust
  • Familiarity with Unix/Linux system internals
  • Familiarity with networking
  • Familiarity with distributed systems

Nice to have

  • Experience in MySQL
  • Experience in Redis
  • Experience in Ngnix
  • Experience in Kubernetes
  • Experience in Docker
  • Experience in OpenStack
  • Experience in Hadoop
  • Experience in Spark
  • Experience in Flink
  • Experience in designing large-scale distributed systems
  • Experience in analyzing large-scale distributed systems
  • Strong skills in problem solving
  • Strong skills in communication