Data Center Engineer

Roblox Roblox · Consumer · Ashburn, VA · Engineering Operations

Roblox is seeking a Data Center Engineer to manage and prioritize host repair efforts, perform initial troubleshooting for hardware and network issues, and own projects to scale their data centers and hardware infrastructure. The role involves maintaining server and network infrastructure, collaborating across regions, identifying and solving operational problems, contributing to automation, and participating in an on-call rotation.

What you'd actually do

  1. Manage and prioritize your ticket queue according to defined priorities, performing initial troubleshooting for server and network issues, and escalating clearly when problems fall outside standard procedures.
  2. Maintain the Core Data Center and hardware infrastructure to meet the large scale and real-time requirements of our Imagination Platform™ to ensure our community has an awesome experience anywhere in the world. This includes all aspects of the server, network infrastructure, power, and environmental life cycles.
  3. Collaborate across regions to track and mitigate systemic issues preventing hosts from returning to service.
  4. Identify and solve recurring operational problems through root cause analysis, and propose improvements to runbooks, SOPs, and MOPs to prevent re-occurrence.
  5. Contribute data, feedback, and requirements to partners building automation, ensuring that automation reflects real-world operational workflows.

Skills

Required

  • 4+ years of experience working in large-scale Data Center Infrastructure environments
  • experience planning, executing, and documenting repairs in the server and networking domains
  • Extensive experience installing, monitoring, and maintaining server and network equipment
  • In-depth knowledge of data center environments, servers, and network equipment
  • Proven experience executing on multiple tasks simultaneously
  • Proficiency with server out‑of‑band management tools
  • Proficiency with Linux/Unix or Windows command-line tools
  • installed various equipment that commonly resides in the data center environment
  • able to lift 75 pounds occasionally

Nice to have

  • building processes and procedures
  • developing new capabilities as a team
  • asks the right questions to solve issues within your expertise
  • use data to test your theories
  • formulate and identify problems
  • generate and evaluate a variety of solutions
  • implement the best one(s)
  • committed to demonstrating professionalism in all interactions with partners
  • foster trust and uphold the reputation of the team and company