Software Engineer, Data Infrastructure

Cursor Cursor · Coding AI · San Francisco, CA · Engineering

Software Engineer, Data Infrastructure at Cursor, a company focused on automating coding. This role involves owning and operating data pipelines and storage systems that power model improvement, evals, and experimentation, with a focus on correctness, cost, and ergonomics. The role requires experience with Spark, Ray Data, and debugging performance issues across the data stack.

What you'd actually do

  1. Own the full ladder: patch what should be patched, redesign what should be redesigned, ship the replacement, and operate it.
  2. Design and ship the replacement for a core pipeline while keeping the existing system running.
  3. Define what needs to be captured and wire it through for new product surfaces lacking instrumentation.
  4. Fix instrumentation gaps, add contracts to prevent recurrence, and ship dashboards to catch issues earlier.
  5. Design schema evolution and validation for multiple consumers depending on overlapping data.
  6. Decide what data is worth keeping, implement retention and compression, and delete what is not.

Skills

Required

  • Spark (Databricks or open-source Spark)
  • Ray Data
  • large data pipelines
  • storage systems
  • debugging performance issues
  • data modeling
  • maintainability

Nice to have

  • ClickHouse
  • dbt
  • Dagster

What the JD emphasized

  • built real systems at scale
  • cares about correctness, cost, and ergonomics
  • Deep experience with Spark
  • Production experience with Ray Data
  • Hands-on ownership of large data pipelines and storage systems
  • Comfort debugging performance issues across client instrumentation, streaming, storage, and model-facing workflows, as well as, compute, storage, and networking layers
  • Clear thinking about data modeling and long-term maintainability
  • good judgment about when to patch and when to rebuild

Other signals

  • Data infrastructure is what turns them into something teams can trust.
  • This role owns the full ladder: patch what should be patched, redesign what should be redesigned, ship the replacement, and operate it.
  • Privacy guarantees are part of correctness.