Data Engineer

Abridge · Vertical AI · San Francisco, CA · Builder

Data Engineer to build and optimize large scale data infrastructure for ML training and evaluation, and to support business decisions and product features. Focus on data pipelines, OLAP databases, ELTs, and data tooling.

What you'd actually do

  1. Build and maintain scalable data services, pipelines and storage solutions for the feedback of unstructured application data for ML training and evaluation purposes.
  2. Build and manage OLAP databases, ELTs and general data tooling for analytics , business decisions and products features.
  3. Work closely with a team of frontend and backend engineers, product managers, and analysts.
  4. Optimize data infrastructure to enhance the throughput, latency and reliability of the data system.
  5. Investigate and correct issues identified through data operations monitors, tools, and reports.
  6. Designs data integrations and data quality framework.

Skills

Required

  • Data Engineering
  • Backend Engineering
  • Python
  • Java
  • Scala
  • SQL
  • GCP
  • AWS
  • Azure
  • structured data
  • unstructured data
  • distributed systems
  • Terraform
  • Kubernetes
  • containerization

Nice to have

  • ML model deployment at scale
  • data products

What the JD emphasized

  • 5+ years of experience in Data Engineering or Backend Engineering with a focus on data systems
  • Proficient in at least one general purpose programming language (e.g., Python, Java, Scala) and SQL (any variant)
  • Proficiency with at least one modern cloud provider (GCP, AWS, Azure) and accompanying data services
  • Experience in building systems that manage the ingest, transformation, and management of both structured and unstructured data types
  • Deep knowledge of modern data infrastructure best practices
  • Experience with distributed systems and different distributed processing frameworks
  • Experience with Terraform, Kubernetes, and containerization technologies.
  • Experience in building data products that are well-modeled, documented and easy to understand and maintain.

Other signals

  • build and optimize large scale data infrastructure
  • drive business decisions and machine learning research
  • ML training and evaluation purposes
  • data integrations and data quality framework