Software Engineer Iii, Telemetry Infrastructure

Google Google · Big Tech · Sunnyvale, CA +1

This role focuses on building and maintaining a high-performance telemetry infrastructure to support demanding workloads, including LLM training. The engineer will write product/system development code, participate in design reviews, review code, contribute to documentation, and triage/debug system issues. While the role supports AI workloads, its core craft is infrastructure and telemetry, not direct AI/ML model development.

What you'd actually do

  1. Write product or system development code.
  2. Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
  3. Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
  4. Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
  5. Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality.

Skills

Required

  • software development
  • large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture
  • lower-level code
  • networking stacks
  • streaming Remote Procedure Calls (RPCs)

Nice to have

  • data structures
  • algorithms
  • accessible technologies
  • storage systems
  • parallel systems
  • Graphics Processing Unit (GPU) related coding