Software Engineer I , Coding Pod

Handshake · Enterprise · San Francisco, CA · Engineering

Software Engineer on the Coding Pod will build data infrastructure and pipelines for frontier AI coding models, focusing on creating large-scale, high-quality benchmark datasets for evaluating model performance on coding tasks. This role involves owning end-to-end data pipelines, integrating with developer ecosystems, and working with evaluation systems and agentic coding tools.

What you'd actually do

  1. Design and build scalable data pipelines for generating, transforming, and validating large-scale coding datasets
  2. Develop systems for task generation, dataset curation, and quality assurance, including automated and human-in-the-loop evaluation workflows
  3. Integrate with developer ecosystems (e.g., GitHub) and build tooling that supports real-world coding environments
  4. Work with containerized environments (e.g., Docker) to safely execute and evaluate code at scale
  5. Build backend systems and APIs that power dataset delivery and model evaluation pipelines

Skills

Required

  • 2+ years of professional software engineering experience, with exposure to backend systems or data engineering
  • Strong programming skills (e.g., Python, TypeScript, or similar)
  • Experience building or maintaining data pipelines, ETL systems, or distributed systems
  • Familiarity with containerization (e.g., Docker) and cloud infrastructure (AWS, GCP, or similar)
  • Understanding of databases and data modeling (SQL or NoSQL)
  • Comfort working across systems boundaries, from infrastructure to developer-facing tools
  • Strong problem-solving skills and attention to detail, especially in data quality and correctness
  • Effective communication skills and ability to collaborate cross-functionally

Nice to have

  • Experience working with machine learning datasets, evaluation frameworks, or benchmarking systems
  • Familiarity with coding agents, developer tools, or AI-assisted programming systems
  • Experience integrating with APIs like GitHub or building developer platform tooling
  • Exposure to workflow orchestration tools (e.g., Airflow, Temporal) or distributed job systems
  • Experience designing automated testing or grading systems for code
  • Background in high-growth or infrastructure-heavy engineering environments
  • Interest in the intersection of AI, developer productivity, and real-world software engineering workflows

What the JD emphasized

  • frontier AI
  • large-scale
  • high-quality
  • real-world
  • economically valuable
  • end-to-end data pipelines
  • human-in-the-loop evaluation systems
  • agentic coding tools
  • automated assessment frameworks
  • developer ecosystems
  • real-world coding environments
  • containerized environments
  • execute and evaluate code at scale
  • dataset delivery
  • model evaluation pipelines
  • evaluation methodologies
  • dataset quality
  • automated grading
  • benchmarking
  • assessment systems
  • pipeline performance
  • reliability
  • scalability
  • distributed systems
  • data infrastructure
  • evaluation systems
  • pipeline orchestration

Other signals

  • building data infrastructure and pipelines for frontier AI coding models
  • creating large-scale, high-quality benchmark datasets that evaluate how models perform on real-world, economically valuable coding tasks
  • own end-to-end data pipelines—from task generation and dataset construction to quality assurance and delivery
  • work with human-in-the-loop evaluation systems, agentic coding tools, and automated assessment frameworks