Staff Software Engineer - Core Services

Snorkel AI Snorkel AI · Data AI · Redwood City, CA +1 · 312 - Engineering

Staff Software Engineer for Core Services at Snorkel AI, focusing on building and maintaining the data platform that powers the company's AI solutions. This role involves designing event-driven data flows, implementing data governance and lineage tracking, instrumenting the platform for observability, and optimizing infrastructure costs. The engineer will also contribute to modernizing CI/CD and integrating AI SRE tooling, shaping the AI-native development workflow.

What you'd actually do

  1. Build and maintain the shared data access library and SDKs that Platform, Packaging, and Dataset API teams use to read from and write to multiple data sources (Snowflake, S3, RDS). Design interfaces that abstract source-level complexity while providing built-in auth, RBAC enforcement, pagination, and query governance.
  2. Design and implement event-driven data flows using event brokers, CDC connectors, schema registry, event routing, dead letter queues. Make sure events flow reliably and failures are visible and recoverable.
  3. Build the systems that track how data moves through the platform (lineage), enforce who can access what (governance and RBAC), and log what happened (auditing). This includes PII handling, retention policy enforcement, and audit infrastructure for enterprise and federal compliance.
  4. Instrument the data platform with OpenTelemetry, define and monitor SLOs for query latency and pipeline success rates, and build alerting that catches issues before they become incidents. You will be on-call for the systems you build.
  5. Contribute to infrastructure cost visibility and optimization - query cost estimation, workload right-sizing, and routing data to the most cost-effective storage tier for its access pattern.

Skills

Required

  • 8+ years building platform infrastructure, data infrastructure, data platforms, or backend systems with significant data components.
  • Strong proficiency in Python.
  • Hands-on experience with SQL and at least two of: Snowflake, Redshift, Postgres.
  • Experience with AWS — S3, RDS, EKS, EventBridge, IAM.
  • Experience with Kubernetes.
  • Familiarity with data orchestration tools (Prefect, Airflow, or Dagster) and transformation frameworks (dbt).
  • Understanding of data governance concepts — RBAC, PII handling, audit logging, data lineage.
  • Fluency with AI-assisted development tools (Claude Code, Cursor, or similar).

Nice to have

  • Experience building shared libraries or SDKs consumed by multiple teams — versioning, backwards compatibility, migration support.
  • Experience with event-driven architectures — CDC, event buses, schema registries, at-least-once delivery semantics.
  • Experience with OpenTelemetry, ClickHouse, or similar observability infrastructure.
  • Prior work in regulated environments (SOC 2, FedRAMP, HIPAA) where compliance requirements shaped system design.
  • Experience with Ray for distributed compute workloads.

What the JD emphasized

  • AI-native development workflow
  • Fluency with AI-assisted development tools (Claude Code, Cursor, or similar). This is a hard requirement — the team uses these tools daily and we expect engineers to leverage them for code generation, debugging, and investigation.