Senior Data Engineer, Gtm

Google Google · Big Tech · Mountain View, CA +3

Senior Data Engineer role focused on building data pipelines and infrastructure to process unstructured customer feedback data, integrating AI agents and LLMs for insights and routing. The role involves NLP, embedding workflows, and MLOps/LLMOps principles.

What you'd actually do

  1. Design and maintain pipelines to ingest, clean, and process massive volumes of unstructured data, including business transcripts and support cases, into reliable analytical datasets.
  2. Architect and deploy advanced platforms and tooling that empower the team to leverage autonomous AI agents and Large Language Models (LLMs) for intelligent routing and automated insights.
  3. Develop internal libraries and self-serve frameworks that streamline Natural Language Processing (NLP) and causal analysis, significantly reducing operational friction and enhancing team productivity.
  4. Manage and optimize embedding workflows using TensorFlow and Tensor Processing Units (TPUs), ensuring efficient processing that bypasses standard API constraints for high-volume data.
  5. Implement automated monitoring, alerting, and rigorous data quality checks to guarantee the security, reliability, and governance of high-stakes analytical assets.

Skills

Required

  • Python
  • SQL
  • MLOps
  • LLMOps
  • data infrastructure
  • text processing pipelines
  • embedding pipelines
  • data pipelines
  • data schemas
  • unstructured text data
  • machine learning workflows

Nice to have

  • data schemas
  • google colaboratory (Colab)
  • TensorFlow
  • Tensor Processing Units (TPUs)
  • agentic tools and platforms
  • LLM orchestration
  • agentic infrastructure
  • Python
  • SQL
  • MLOps
  • LLMOps

What the JD emphasized

  • massive volumes of unstructured conversational data
  • transforming massive volumes of unstructured conversational data
  • customer feedback
  • customer feedback and product decisions
  • accelerating the Ads product adoption flywheel
  • shaping Go-to-Market (GTM) strategy
  • own and architect the foundational infrastructure
  • transforms unstructured customer feedback
  • quantified strategic assets
  • scalable, automated pipelines
  • integrate sales transcripts
  • Business Intelligence (BI)
  • pioneer our transition
  • flexible workflows
  • core infrastructure and platforms
  • multiply our data science team's capacity, agility, and impact
  • end-to-end delivery
  • production-ready solutions
  • ingest, clean, and process massive volumes of unstructured data
  • business transcripts and support cases
  • reliable analytical datasets
  • Architect and deploy advanced platforms and tooling
  • leverage autonomous AI agents and Large Language Models (LLMs)
  • intelligent routing and automated insights
  • Develop internal libraries and self-serve frameworks
  • streamline Natural Language Processing (NLP) and causal analysis
  • significantly reducing operational friction
  • enhancing team productivity
  • Manage and optimize embedding workflows
  • TensorFlow and Tensor Processing Units (TPUs)
  • efficient processing
  • bypasses standard API constraints
  • high-volume data
  • Implement automated monitoring, alerting, and rigorous data quality checks
  • guarantee the security, reliability, and governance
  • high-stakes analytical assets
  • 5 years of experience coding in Python and SQL
  • 5 years of experience working with machine learning operations (MLOps) and large language model operations (LLMOps) principles and data infrastructure
  • deploying text processing and embedding pipelines
  • 5 years of experience designing and deploying data pipelines
  • managing data schemas
  • processing unstructured text data
  • machine learning (ML) workflows
  • Experience with LLM orchestration and agentic infrastructure

Other signals

  • transforming massive volumes of unstructured conversational data into quantified, trusted insights
  • architect the foundational infrastructure that transforms unstructured customer feedback into quantified strategic assets
  • scalable, automated pipelines that integrate sales transcripts with critical Business Intelligence (BI)
  • pioneer our transition towards more flexible workflows, developing the core infrastructure and platforms that multiply our data science team's capacity, agility, and impact through end-to-end delivery of production-ready solutions
  • leverage autonomous AI agents and Large Language Models (LLMs) for intelligent routing and automated insights
  • streamline Natural Language Processing (NLP) and causal analysis
  • Manage and optimize embedding workflows using TensorFlow and Tensor Processing Units (TPUs)
  • Implement automated monitoring, alerting, and rigorous data quality checks