Senior Software Engineer, Cloud Infrastructure Monitoring

Google Google · Big Tech · Taipei, Taiwan

This role is for a Senior Software Engineer on the AI and Infrastructure team at Google, focusing on designing and measuring high-performance distributed machine monitoring software. The role involves leading the implementation of system modeling templates through strategic AI adoption and performance analysis, developing technical roadmaps, and partnering with stakeholders to execute complex engineering programs. The goal is to empower Google customers with AI and Infrastructure at scale, supporting the development of AI models and providing essential platforms for developers.

What you'd actually do

  1. Lead the design and measurement of high-performance distributed machine monitoring software, including messaging layers, C/C++ libraries, and system daemons.
  2. Direct the implementation of system modeling templates for New Product Introductions (NPIs) through strategic AI adoption and performance analysis.
  3. Develop technical roadmaps to influence executive decisions and define "Best Known Method" (BKM) practices across multiple engineering teams.
  4. Partner with internal and external stakeholders to execute complex engineering programs, focusing on product development and productivity enhancements.
  5. Foster team growth by modeling high technical standards, providing constructive feedback, and contributing innovative ideas to the engineering organization.

Skills

Required

  • software development
  • software design
  • software architecture
  • large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture

Nice to have

  • Baseboard Management Controller (BMC) (AMI or OpenBMC)
  • Unified Extensible Firmware Interface (UEFI) Basic Input/Output System (BIOS)
  • Linux kernel development
  • system diagnostics
  • software/hardware integration
  • system bring-up
  • open-source development workflows
  • Redfish
  • Open Compute Project (OCP) standards
  • server x86 computer architecture
  • Python
  • shell scripting
  • automation
  • system management
  • system modeling descriptions
  • structural improvements across large-scale hardware programs

What the JD emphasized

  • high-performance distributed machine monitoring software
  • strategic AI adoption
  • performance analysis
  • large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture