Senior Software Engineer, Data Acquisition

OpenAI OpenAI · AI Frontier · San Francisco, CA · Research

Senior Software Engineer on the Data Acquisition team responsible for web crawling, data ingestion, and search services to support model training operations. This role involves building and deploying highly scalable distributed systems for handling large datasets and working with Kubernetes infrastructure.

What you'd actually do

  1. Own and lead engineering projects in the area of data acquisition including web crawling, data ingestion, and search.
  2. Collaborate with other sub-teams, such as Data Processing, Architecture, and Scaling, to ensure smooth data flow and system operability.
  3. Work closely with the legal team to handle any compliance or data privacy-related matters.
  4. Develop and deploy highly scalable distributed systems capable of handling petabytes of data.
  5. Architect and implement algorithms for data indexing and search capabilities.

Skills

Required

  • 6+ years of industry experience in software development
  • Strong expertise in large stateful distributed systems and data processing
  • Proficiency in Kubernetes, and Infrastructure-as-Code concepts
  • BS/MS/PhD in Computer Science or a related field

Nice to have

  • Experience with large web crawlers

What the JD emphasized

  • petabytes of data
  • large web crawlers