Software Engineer - Core Services

Snorkel AI Snorkel AI · Data AI · Redwood City, CA +1 · 312 - Engineering

Software Engineer for Core Services team at Snorkel AI, focusing on building and maintaining the data platform that powers Snorkel's AI solutions. This role involves designing event-driven data flows, implementing data governance and lineage tracking, instrumenting the platform for observability, and optimizing infrastructure costs. The position emphasizes working with Python, SQL, AWS, Kubernetes, and data orchestration tools, while also integrating AI SRE tooling and AI-assisted development workflows.

What you'd actually do

  1. Build and maintain the shared data access library and SDKs that Platform, Packaging, and Dataset API teams use to read from and write to multiple data sources (Snowflake, S3, RDS). Design interfaces that abstract source-level complexity while providing built-in auth, RBAC enforcement, pagination, and query governance.
  2. Design and implement event-driven data flows using event brokers, CDC connectors, schema registry, event routing, dead letter queues. Make sure events flow reliably and failures are visible and recoverable.
  3. Build the systems that track how data moves through the platform (lineage), enforce who can access what (governance and RBAC), and log what happened (auditing). This includes PII handling, retention policy enforcement, and audit infrastructure for enterprise and federal compliance.
  4. Instrument the data platform with OpenTelemetry, define and monitor SLOs for query latency and pipeline success rates, and build alerting that catches issues before they become incidents. You will be on-call for the systems you build.
  5. Contribute to infrastructure cost visibility and optimization - query cost estimation, workload right-sizing, and routing data to the most cost-effective storage tier for its access pattern.

Skills

Required

  • Python
  • SQL
  • AWS
  • Kubernetes
  • Prefect
  • FastAPI
  • dbt
  • Snowflake
  • RDS
  • S3
  • EKS
  • EventBridge
  • IAM
  • Terraform
  • AI-assisted development tools

Nice to have

  • shared libraries or SDKs
  • event-driven architectures
  • CDC
  • event buses
  • schema registries
  • at-least-once delivery semantics
  • OpenTelemetry
  • ClickHouse
  • observability infrastructure
  • regulated environments
  • SOC 2
  • FedRAMP
  • HIPAA
  • Ray

What the JD emphasized

  • AI-assisted development tools