What you'd actually do

Design and implement components of scalable, fault-tolerant web crawling and extraction pipelines

Write clean, production-grade code in Java and Python

Build and operate ETL/ELT pipelines for large-scale data extraction and transformation

Work with cloud infrastructure on GCP and AWS, primarily on GKE

Improve observability, reliability, and operational excellence across the systems you contribute to

Skills

Required

5+ years of professional software engineering experience building production systems
Strong CS fundamentals: algorithms, data structures, concurrency, distributed systems
Proficiency in Java and/or Python
Track record of owning features end-to-end from design through deployment and operation
Comfortable making sound architectural decisions at the component level
Hands-on experience with cloud data warehouses such as BigQuery or Snowflake
Experience designing and operating large-scale ETL/ELT pipelines
Experience with orchestration tools such as Apache Airflow
Experience with streaming or event-driven systems such as Apache Kafka
Production experience on GCP (preferred) or AWS; multi-cloud exposure is a plus
Hands-on experience with Kubernetes (GKE/EKS) for distributed workloads
Familiarity with infrastructure-as-code tooling such as Terraform
Strong communicator who can explain technical decisions clearly
Comfortable operating in ambiguity and iterating quickly
Bias toward action and pragmatic problem solving
Self-starter who thrives in fast-paced, evolving environments

Nice to have

Experience with web crawling at scale (Scrapy or similar frameworks)
Familiarity with proxy infrastructure, rotation strategies, or anti-bot evasion techniques
Experience in extracting structured and unstructured web data from diverse site architectures
Knowledge of SERP (Search Engine Results Page) extraction
Comfort with AI/LLM-based extraction approaches, applying language models to HTML at scale
Experience working in a B2B data company or data-as-a-product environment

What the JD emphasized

strong software engineering fundamentals

great engineer first

5+ years of professional software engineering experience building production systems

Strong CS fundamentals: algorithms, data structures, concurrency, distributed systems

Track record of owning features end-to-end from design through deployment and operation

Hands-on experience with cloud data warehouses such as BigQuery or Snowflake

Experience designing and operating large-scale ETL/ELT pipelines

Production experience on GCP (preferred) or AWS

Hands-on experience with Kubernetes (GKE/EKS) for distributed workloads

ZoomInfo is where careers accelerate. We move fast, think boldly, and empower you to do the best work of your life. You’ll be surrounded by teammates who care deeply, challenge each other, and celebrate wins. With tools that amplify your impact and a culture that backs your ambition, you won’t just contribute. You’ll make things happen–fast.

The Opportunity

We're looking for a Senior Software Engineer to join our Web Data team and help build the next generation of ZoomInfo's web crawling and data extraction infrastructure.

This is a hands-on engineering role with high impact. You'll work alongside experienced engineers building large-scale crawling and extraction systems that process billions of pages. Your day-to-day will involve solving real distributed systems problems, writing production code, and shipping features that directly impact data quality across the platform.

You'll partner with a dedicated people manager who handles HR and administrative responsibilities, a product manager who connects business needs with technical work, and a senior manager who removes roadblocks and supports career growth. Your focus is on engineering execution, technical excellence, and collaboration.

What You'll Do

As a Senior Software Engineer, you'll contribute to enterprise-scale crawling and extraction platforms that process massive volumes of web data.

You will:

Design and implement components of scalable, fault-tolerant web crawling and extraction pipelines
Write clean, production-grade code in Java and Python
Build and operate ETL/ELT pipelines for large-scale data extraction and transformation
Work with cloud infrastructure on GCP and AWS, primarily on GKE
Improve observability, reliability, and operational excellence across the systems you contribute to
Partner with product and data science teams to deliver impactful solutions
Contribute to code reviews, documentation, and knowledge sharing across the team
Stay current with evolving web technologies, anti-crawling mechanisms, and AI-powered extraction approaches

Must-Have Qualifications

We're prioritizing strong software engineering fundamentals over deep crawling-specific experience. The right candidate is a great engineer first; the domain can be learned on the team.

Software Engineering Fundamentals

5+ years of professional software engineering experience building production systems
Strong CS fundamentals: algorithms, data structures, concurrency, distributed systems
Proficiency in Java and/or Python
Track record of owning features end-to-end from design through deployment and operation
Comfortable making sound architectural decisions at the component level

Data Engineering

Hands-on experience with cloud data warehouses such as BigQuery or Snowflake
Experience designing and operating large-scale ETL/ELT pipelines
Experience with orchestration tools such as Apache Airflow
Experience with streaming or event-driven systems such as Apache Kafka

Cloud and Infrastructure

Production experience on GCP (preferred) or AWS; multi-cloud exposure is a plus
Hands-on experience with Kubernetes (GKE/EKS) for distributed workloads
Familiarity with infrastructure-as-code tooling such as Terraform

Background and Mindset

Strong communicator who can explain technical decisions clearly
Comfortable operating in ambiguity and iterating quickly
Bias toward action and pragmatic problem solving
Self-starter who thrives in fast-paced, evolving environments

Nice to Have

Experience with web crawling at scale (Scrapy or similar frameworks)
Familiarity with proxy infrastructure, rotation strategies, or anti-bot evasion techniques
Experience in extracting structured and unstructured web data from diverse site architectures
Knowledge of SERP (Search Engine Results Page) extraction
Comfort with AI/LLM-based extraction approaches, applying language models to HTML at scale
Experience working in a B2B data company or data-as-a-product environment

Core Technical Stack

Java and Python
Apache Kafka
GCP (BigQuery, GKE, Vertex AI)
Snowflake and Starburst/Trino
Terraform
Scrapy and web scraping frameworks
Proxy management systems
Distributed systems and Kubernetes
Apache Airflow
Large-scale ETL pipelines

Ideal Profile

5+ years of software engineering experience
Experience operating systems that process meaningful volumes of data
Strong CS fundamentals (algorithms, data structures, distributed systems)
Excellent communicator who can explain complex ideas to diverse audiences
Passion for solving hard problems and building elegant, scalable systems
Self-starter who thrives in fast-paced, evolving environments
Experience working in a B2B data company or data-as-a-product environment is a strong plus

Why This Role Matters

This role sits at the heart of ZoomInfo's data platform. You'll contribute to the infrastructure that powers our business intelligence products and help shape how web data acquisition is done at ZoomInfo, while working at a massive scale with cutting-edge technologies.

You'll be joining at a pivotal moment: the team is growing, the ambition is high, and there's a real opportunity to make a lasting impact and grow into deeper technical leadership over time.

#LI-AR2

#LI-REMOTE

Actual compensation offered will be based on factors such as the candidate’s work location, qualifications, skills, experience and/or training. Your recruiter can share more information about the specific salary range for your desired work location during the hiring process. We want our employees and their families to thrive.

In addition to comprehensive benefits we offer holistic mind, body and lifestyle programs designed for overall well-being. Learn more about ZoomInfo benefits here.

Below is the US base salary for this position. Additional compensation such as Bonus, Commission, Equity and other benefits may also apply.

$140,000—$220,000 USD

About us:

ZoomInfo (NASDAQ: GTM) is the Go-To-Market Intelligence Platform that empowers businesses to grow faster with AI-ready insights, trusted data, and advanced automation. Its solutions provide more than 35,000 companies worldwide with a complete view of their customers, making every seller their best seller.

ZoomInfo is committed to protecting your privacy when you apply for jobs with us. Please review our Job Applicant Privacy Notice for more details on how we handle your personal information.

ZoomInfo may use a software-based assessment as part of the recruitment process. More information about this tool, including the results of the most recent bias audit, is available here.

ZoomInfo is proud to be an equal opportunity employer, hiring based on qualifications, merit, and business needs, and does not discriminate based on protected status. We welcome all applicants and are committed to providing equal employment opportunities regardless of sex, race, age, color, national origin, sexual orientation, gender identity, marital status, disability status, religion, protected military or veteran status, medical condition, or any other characteristic protected by applicable law. We also consider qualified candidates with criminal histories in accordance with legal requirements.

For Massachusetts Applicants: It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability. ZoomInfo does not administer lie detector tests to applicants in any location.