Senior Machine Learning Engineer

ZoomInfo · Enterprise · MA · Remote · 867 Engineering - Data Science

Senior Machine Learning Engineer role focused on building and productionizing AI/ML systems for entity resolution, knowledge graphs, and agentic workflows on large-scale datasets. The role involves developing transformer/RAG architectures, retrieval pipelines, NER models, and cross-dataset entity resolution frameworks, with an emphasis on performance optimization and deployment.

What you'd actually do

  1. Invent and productionize Transformer/RAG/Graph RAG architectures that surface the right contact, company, or insight while driving quantization, distillation, and SLM fine-tuning (GTE-Qwen, modernBERT) so models stay fast and affordable at petabyte scale
  2. Prototype and launch hybrid dense/sparse retrieval pipelines on vector DBs to build language-agnostic clustering and classification systems that power our intelligence layer
  3. Own high-recall NER models that tag people, orgs, locations, and industry-specific entities across multi-language text, extracting structured insights from web data to improve our signal detection capabilities
  4. Build cross-dataset entity-resolution frameworks that dedupe and merge hundreds of millions of fragmented company and person records with sub-second latency, creating enriched, unified entities enhanced with knowledge-graph signals
  5. Design and implement agentic workflows with robust evaluation frameworks focused on NER and entity resolution tasks, including large-scale A/B and back-testing plans that close the loop from experiment to KPI uplift

Skills

Required

  • Transformer architectures (BERT/GPT/T5)
  • RAG systems
  • Vector databases
  • Information retrieval
  • NER
  • Entity resolution
  • Knowledge graphs
  • Quantization
  • Distillation
  • Fine-tuning
  • Python
  • PyTorch or TensorFlow
  • Large-scale data processing
  • Low-latency systems
  • MLOps

Nice to have

  • Go/Java
  • Graph RAG
  • SLM fine-tuning
  • Graph RAG

What the JD emphasized

  • end-to-end owner
  • entity resolution
  • knowledge-graph
  • billions of records
  • millions of daily queries
  • productionize
  • quantization
  • distillation
  • SLM fine-tuning
  • petabyte scale
  • vector DBs
  • high-recall NER models
  • multi-language text
  • sub-second latency
  • agentic workflows
  • robust evaluation frameworks
  • large-scale A/B and back-testing
  • production reliability
  • measurable ML KPIs
  • mentoring junior scientists and engineers
  • end-to-end project ownership
  • scalable ML solutions
  • 6+ years hands-on ML/NLP experience
  • delivered, revenue-impacting products in production environments
  • Deep expertise
  • modern AI architectures
  • transformer stacks
  • RAG systems
  • vector-based information retrieval
  • latency/throughput optimization techniques
  • Proven track record
  • 100M+ record scale
  • record linkage
  • data deduplication
  • knowledge-graph integration
  • Strong applied research capabilities
  • software-engineering rigor

Other signals

  • entity resolution
  • knowledge graph
  • transformer architectures
  • RAG
  • vector databases
  • NER
  • agentic workflows