Software Engineer, Data Acquisition

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

Software Engineer focused on data acquisition for model training, involving web crawling, data ingestion, and large-scale distributed systems.

What you'd actually do

  1. Own and lead engineering projects in the area of data acquisition including web crawling, data ingestion, and search.
  2. Collaborate with other sub-teams, such as Data Processing, Architecture, and Scaling, to ensure smooth data flow and system operability.
  3. Work closely with the legal team to handle any compliance or data privacy-related matters.
  4. Develop and deploy highly scalable distributed systems capable of handling petabytes of data.
  5. Architect and implement algorithms for data indexing and search capabilities.

Skills

Required

  • 4+ years of industry experience in software development
  • Strong expertise in large stateful distributed systems and data processing
  • Proficiency in Kubernetes, and Infrastructure-as-Code concepts

Nice to have

  • Experience with large web crawlers