Senior Data Engineer, Telemetry and Infrastructure

NVIDIA NVIDIA · Semiconductors · Yokneam, Israel

NVIDIA is seeking a Senior Data Engineer to build and maintain data pipelines for R&D telemetry and performance analytics. The role involves designing ETL/ELT frameworks, optimizing streaming pipelines with Spark, Kafka, and Databricks, and implementing data quality and observability standards for large-scale AI and HPC clusters.

What you'd actually do

  1. Define and execute the group’s data technical roadmap, aligning with R&D, hardware, and DevOps teams
  2. Design and maintain flexible ETL/ELT frameworks for ingesting, transforming, and classifying telemetry and performance data
  3. Build and optimize streaming pipelines using Apache Spark, Kafka, and Databricks, ensuring high throughput, reliability, and adaptability to evolving data schemas
  4. Implement and maintain observability and data quality standards, including schema validation, lineage tracking, and metadata management
  5. Deliver reliable insights for cluster performance analysis, telemetry visibility, and end-to-end test coverage

Skills

Required

  • 5+ years of hands-on experience in data engineering or backend development
  • Strong practical experience with Apache Spark (PySpark or Scala) and Databricks
  • Expertise with Apache Kafka, including stream ingestion, schema registry, and event processing
  • Proficiency in Python and SQL for data transformation, automation, and pipeline logic
  • Familiarity with ETL orchestration tools (Airflow, Prefect, or Dagster)
  • Experience with schema evolution, data versioning, and validation frameworks (Delta Lake, Iceberg, or Great Expectations)
  • Solid understanding of cloud environments (AWS preferred; GCP or Azure also relevant)
  • Knowledge of streaming and telemetry data architectures in large-scale, distributed systems

Nice to have

  • Exposure to hardware, firmware, or embedded telemetry environments.
  • Experience with real-time analytics frameworks (Spark Structured Streaming, Flink, Kafka Streams)
  • Experience with data cataloging or governance tools (DataHub, Collibra, or Alation)
  • Familiarity with CI/CD for data pipelines and infrastructure-as-code (Terraform, GitHub Actions)
  • Experience designing performance metrics data systems (latency, throughput, resource utilization) that support high-volume, high-frequency telemetry at scale

What the JD emphasized

  • massive volumes of real-time telemetry data
  • large-scale AI and HPC clusters
  • high throughput, reliability, and adaptability to evolving data schemas
  • observability and data quality standards
  • schema validation, lineage tracking, and metadata management
  • streaming and telemetry data architectures in large-scale, distributed systems