Software Engineer, Site Reliability

Google Google · Big Tech · Mountain View, CA +1

Software Engineer, Site Reliability focused on developing and improving tools, libraries, and systems for reliability, scalability, and efficiency. Responsibilities include code reviews, technical design, issue triage, root cause analysis, and documentation. Requires experience in designing/writing software, algorithms, data structures, troubleshooting distributed systems, risk management, and workflow automation.

What you'd actually do

  1. Develop and improve code for tools, libraries, and systems with a focus on reliability, scalability, and efficiency.
  2. Review code from other engineers to ensure adherence to best practices in style, accuracy, testability, and maintainability.
  3. Lead and participate in technical design reviews to select appropriate technologies and create simple, robust solutions.
  4. Triage and resolve system issues by debugging, analyzing root causes, and implementing preventative measures for operational quality.
  5. Create and maintain technical documentation and educational content, adapting it based on product updates and user feedback. Champion engineering best practices, contribute to community work like hiring, and drive the adoption of improved standards.

Skills

Required

  • Designing, writing, testing, and maintaining software applications
  • Selecting and designing algorithms and data structures to improve system scale, speed, and reliability
  • Troubleshooting and performing root cause analysis on large-scale distributed systems
  • Debugging large-scale distributed systems
  • Anticipating, assessing, and managing risks to systems and tools
  • Simplifying or automating processes and systems

What the JD emphasized

  • large-scale distributed systems
  • Debugging large-scale distributed systems
  • System risk management
  • Workflow automation