Senior Software Engineer, Cloud Infrastructure Monitoring

Google Google · Big Tech · Taipei, Taiwan

This role is for a Senior Software Engineer on the AI and Infrastructure team, focusing on designing and measuring high-performance distributed machine monitoring software. The role involves leading the implementation of system modeling templates through AI adoption and performance analysis, developing technical roadmaps, and partnering with stakeholders to execute engineering programs. The goal is to empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at scale.

What you'd actually do

  1. Lead the design and measurement of high-performance distributed machine monitoring software, including messaging layers, C/C++ libraries, and system daemons.
  2. Direct the implementation of system modeling templates for New Product Introductions (NPIs) through strategic AI adoption and performance analysis.
  3. Develop technical roadmaps to influence executive decisions and define "Best Known Method" (BKM) practices across multiple engineering teams.
  4. Partner with internal and external stakeholders to execute complex engineering programs, focusing on product development and productivity enhancements.
  5. Foster team growth by modeling high technical standards, providing constructive feedback, and contributing innovative ideas to the engineering organization.

Skills

Required

  • software development in one or more programming languages
  • testing, maintaining, or launching software products
  • software design and architecture
  • developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage or hardware architecture

Nice to have

  • Baseboard Management Controller (BMC) (AMI or OpenBMC), Unified Extensible Firmware Interface (UEFI) Basic Input/Output System (BIOS), or Linux kernel development and system diagnostics
  • software/hardware integration, system bring-up, and open-source development workflows
  • Redfish, Open Compute Project (OCP) standards, and server x86 computer architecture
  • Python and shell scripting for automation and system management
  • advancing system modeling descriptions and driving structural improvements across large-scale hardware programs

What the JD emphasized

  • high-performance distributed machine monitoring software
  • strategic AI adoption
  • performance analysis
  • large-scale infrastructure
  • distributed systems