Staff Software Engineer, Agentic Ai, Trust and Safety

Google Google · Big Tech · Kirkland, WA +1

Staff Software Engineer role focused on architecting and scaling AI/ML systems and distributed infrastructure for Trust and Safety at Google. The role involves defining the technology roadmap, integrating high-availability systems, leading multi-team technical initiatives, and mentoring other engineers. Requires experience in large-scale distributed systems and building agentic AI systems.

What you'd actually do

  1. Define, advocate, and execute the overarching Trust and Safety technology roadmap, architecting next-generation AI/ML systems and highly reliable distributed infrastructure to automate and scale global user protection.
  2. Oversee the integration of high-availability, low-latency production systems with stringent Service Level Objective (SLO) guarantees, driving excellence across system bottlenecks, data consistency, capacity planning, and cost-efficiency.
  3. Steer critical, multi-team technical initiatives from initial abstract discovery through to large-scale deployment, translating high-level business goals into parallelizable engineering workstreams.
  4. Define standards for fault-tolerant architectures while mentoring Tech Leads in industry best practices across code quality, CI/CD, comprehensive testing, and systemic technical debt reduction.
  5. Partner closely with Product, Policy, and Data Science leadership to co-create the global technology stack, serving as a trusted advisor to executives and abstracting complex technical trade-offs for non-technical stakeholders.

Skills

Required

  • designing and implementing large-scale distributed systems
  • machine learning (ML) infrastructure
  • architectural ownership for distributed systems or infrastructure components
  • building and deploying agentic AI systems

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field
  • managing rapid technical iteration, 0->1 innovation
  • managing deep technical ambiguity
  • defining organization-wide technical strategies
  • establishing engineering best practices
  • mentoring Executive Engineers and Tech Leads
  • Trust and Safety, content moderation, security, or anti-abuse engineering at a global scale
  • managing billions of daily events or real-time streaming data
  • Strong technical communication skills
  • translate complex architectural trade-offs and AI capabilities into recommendations for cross-functional executives

What the JD emphasized

  • architecting next-generation AI/ML systems
  • highly reliable distributed infrastructure
  • automate and scale global user protection
  • high-availability, low-latency production systems
  • stringent Service Level Objective (SLO) guarantees
  • multi-team technical initiatives
  • large-scale deployment
  • fault-tolerant architectures
  • building and deploying agentic AI systems
  • global scale
  • billions of daily events or real-time streaming data

Other signals

  • architecting next-generation AI/ML systems
  • building and deploying agentic AI systems
  • automating and scale global user protection