Forward Deployed Engineer - Data as a Service

Snorkel AI Snorkel AI · Data AI · Remote · 310 - DaaS FDE

Snorkel AI is seeking a Forward Deployed Engineer to work on AI/ML data products for enterprise clients. This role involves end-to-end ownership of the AI data pipeline lifecycle, including developing and deploying ML-based workflows, building HITL data generation and review processes, generating synthetic datasets, and packaging production-grade datasets. The engineer will also define production specifications, build evaluators, design quality measurement systems, and perform custom model benchmarking.

What you'd actually do

  1. Build and deploy evaluators, design and implement quality measurement systems to validate project outputs and ensure deliverables meet client expectations
  2. Generate synthetic datasets by developing or adapting existing pipelines to accelerate client engagements and augment training data
  3. Package and deliver production-grade datasets with standardized formatting, comprehensive documentation, and quality assurance
  4. Configure and build custom applications and off-platform solutions for non-standard or specialized client requirements
  5. Define production specifications and workflows, securing technical alignment with client teams to enable seamless go-live transitions

Skills

Required

  • Python
  • SQL
  • data science
  • data engineering
  • solution development
  • ML techniques in production
  • validation and evaluation of ML and LLM-based solutions
  • API integration

Nice to have

  • LLM-based solutions

What the JD emphasized

  • 4+ years of experience in data science, engineering, or solution development roles
  • Strong practical experience with Python and SQL data tooling required
  • Familiarity with ML and LLM-based solutions, applying ML techniques in production contexts, and validation and evaluation of ML and LLM-based

Other signals

  • AI/ML data products
  • end-to-end ownership of AI data pipeline lifecycle
  • human-in-the-loop (HITL) data generation and review
  • ML-based applications
  • generate synthetic datasets
  • package and deliver production-grade datasets
  • custom model benchmarking