Data Scientist II

The Trade Desk The Trade Desk · Media · Irvine, CA · Data Science

This Data Scientist II role focuses on designing, implementing, and optimizing scalable graph-based algorithms and statistical models for large-scale datasets. The role involves developing custom data science solutions, analyzing data, validating third-party data, and automating workflows. The primary focus is on data engineering and building data-driven products within the advertising technology domain.

What you'd actually do

  1. Design, implement, and optimize scalable graph-based algorithms to enhance internal graph products.
  2. Optimize large-scale algorithmic systems operating on terabyte-scale datasets using distributed computing frameworks.
  3. Develop custom data science solutions from first principles, leveraging advanced techniques in probability and statistics, machine learning, and graph mining, especially in contexts where off-the-shelf models are ineffective.
  4. Analyze large datasets and build statistical models at scale on top of terabytes worth of data to generate actionable insights for products improvement.
  5. Conduct scientific analysis to validate third-party data and evaluate whether they are useful for products.

Skills

Required

  • programming languages
  • SQL
  • cloud computing platforms
  • version control systems and CI/CD pipelines
  • data analysis and visualization for large-scale datasets
  • building and evaluating machine learning models
  • applying probability and statistical methods to build and evaluate data-driven products
  • Agile
  • developing scalable graph algorithms
  • large-scale graph mining
  • distributed computing frameworks including Spark
  • workflow orchestration tools

What the JD emphasized

  • scalable graph-based algorithms
  • large-scale algorithmic systems
  • terabyte-scale datasets
  • custom data science solutions from first principles
  • advanced techniques in probability and statistics, machine learning, and graph mining
  • large datasets and build statistical models at scale
  • validate third-party data
  • best-subset selection using information theory-based approaches
  • workflow orchestration tools
  • end-to-end data products lifecycle

Other signals

  • design, implement, and optimize scalable graph-based algorithms
  • optimize large-scale algorithmic systems operating on terabyte-scale datasets
  • develop custom data science solutions from first principles, leveraging advanced techniques in probability and statistics, machine learning, and graph mining
  • analyze large datasets and build statistical models at scale on top of terabytes worth of data
  • conduct scientific analysis to validate third-party data and evaluate whether they are useful for products
  • conduct best-subset selection using information theory-based approaches
  • automate and monitor workflows using orchestration tools (e.g. Apache Airflow)
  • own the end-to-end data products lifecycle