Senior Data Engineer

ZoomInfo ZoomInfo · Enterprise · Toronto, ON · 908 Product Management - Analytics

Senior Data Engineer to design and expand enterprise-level data infrastructure for internal teams to interact with data comprehensively. Role involves integrating diverse data sources into AI applications, including LLM-powered systems, and implementing architectures for RAG and advanced search.

What you'd actually do

  1. Design, develop, and maintain high-performance, product-centric data pipelines using Airflow, DBT, and Python.
  2. Architect and optimize the massive-scale data warehouse and lakehouse that serves as our single source of truth for all customer data, primarily using Snowflake.
  3. Lead the integration of diverse structured and unstructured data sources (e.g., web data, third-party APIs) into our data ecosystem, ensuring high-quality and reliable ingestion.
  4. Implement and enforce Model Context Protocol (MCP) or similar architectures to feed accurate and contextual data into our LLM-powered products for applications like Retrieval Augmented Generation (RAG) and advanced search.
  5. Define, monitor, and enforce data quality SLAs across all pipelines and products, ensuring data accuracy and lineage are a top priority.

Skills

Required

  • Expert-level SQL
  • Strong Python programming skills
  • Production-level experience for large-scale batch and streaming data processing
  • DBT (Data Build Tool)
  • Snowflake data warehouse design, optimization, and cost modeling
  • Model Context Protocol (MCP) or similar architectures
  • Data lakes, event-driven architectures (e.g., Kafka), ETL/ELT, and data mesh
  • Cloud platforms (GCP and/or AWS)
  • Infrastructure as code (e.g., Terraform)
  • Excellent communication skills
  • Strategic & Product-Oriented Thinking
  • Leadership & Mentorship
  • Stakeholder Management
  • Agility & Adaptability
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
  • 8+ years of progressive experience in data engineering

Nice to have

  • LLMOps
  • LangChain
  • RAG (Retrieval Augmented Generation) pipelines
  • building embedding models or pipelines for Named Entity Recognition (NER)
  • data cataloging tools (e.g., OpenLIneage, etc.) and lineage tracking
  • other distributed systems and databases (e.g., DynamoDB, Flink)

What the JD emphasized

  • product-centric data pipelines
  • massive-scale data warehouse and lakehouse
  • diverse structured and unstructured data sources
  • LLM-powered products
  • Retrieval Augmented Generation (RAG)
  • data quality SLAs
  • data-centric product company

Other signals

  • Designing and expanding enterprise-level data infrastructure that enables ZoomInfo's internal teams to interact with data comprehensively
  • Integrating vast, diverse data sources into our AI applications, including our industry-leading LLM-powered systems
  • Implement and enforce Model Context Protocol (MCP) or similar architectures to feed accurate and contextual data into our LLM-powered products for applications like Retrieval Augmented Generation (RAG) and advanced search.