Data Center Engineer

Roblox Roblox · Consumer · Chicago, IL · Engineering Operations

This role is for a Senior Data Center Engineer responsible for developing, maintaining, and scaling Core/Edge Data Centers and hardware infrastructure at Roblox. The engineer will manage server, network, power, and environmental systems, troubleshoot issues, implement best practices, automate maintenance, and participate in on-call rotations. The role requires extensive experience in large-scale data center environments, server and network equipment management, and problem-solving.

What you'd actually do

  1. Develop and maintain the Core/Edge Data Center and hardware infrastructure to meet the large scale and real-time requirements of our Imagination Platform™ to ensure our community has an awesome experience anywhere in the world. This includes all aspects of the server, network infrastructure, power, and environmental life cycles.
  2. Own efforts to track and mitigate systemic issues preventing hosts from returning to service.
  3. Identify and solve critical problems and prevent them from re-occurring via root cause analysis and giving recommendations to improve automation.
  4. Coordinate with peers to establish and uphold best practices related to breakfix, install, decom and all other aspects of datacenter operations.
  5. Create, influence, and improve the development platform, infrastructure, standards (Runbooks, SOPs, MOPs), and methods to ensure the goal of scalability and high availability can be achieved.

Skills

Required

  • 6+ years of experience working in large-scale Data Center Infrastructure environments
  • Experience planning, executing, and documenting repairs in the server and networking domains
  • Extensive experience installing, monitoring, and maintaining server and network equipment
  • In-depth knowledge of data center environments, servers, and network equipment
  • Proven experience executing on multiple tasks simultaneously
  • Experience installing various equipment that commonly resides in the data center environment
  • Ability to lift 75 pounds occasionally

Nice to have

  • Experience with automation of maintenance actions
  • Vendor coordination and quality assurance for outsourced projects
  • On-call rotation participation

What the JD emphasized

  • critical problems
  • root cause analysis
  • continuous improvement
  • critical infrastructure