Senior Data Engineer, Nsv

NVIDIA NVIDIA · Semiconductors · Tel Hai, Israel +1

Senior Data Engineer role focused on building scalable data pipelines for AI and HPC clusters, with a contribution to Agentic AI initiatives. The role involves designing, building, and maintaining ETL/ELT frameworks, optimizing streaming pipelines, ensuring data quality, and supporting self-service analytics. A key aspect is contributing to AI Agents that enhance data visibility and accessibility.

What you'd actually do

  1. Define and execute the group's data technical roadmap, aligning with Infra, DevOps, and Performance teams
  2. Design and maintain flexible ETL/ELT frameworks for ingesting, transforming, and classifying cluster verification and telemetry data
  3. Build and optimize streaming pipelines using Apache Spark, Kafka, and Databricks, ensuring high throughput, reliability, and adaptability to evolving data schemas
  4. Ensure data quality and pipeline health through observability standards, schema validation, lineage tracking, monitoring, and alerting
  5. Contribute to the development of AI Agents that enhance the visibility and accessibility of insights and data for our users

Skills

Required

  • B.Sc. or M.Sc. in Computer Science, Data Science, or a related field
  • 5+ years of hands-on experience in data engineering
  • Strong practical experience with Apache Spark( PySpark or Scala) and Databricks
  • Proficiency in Python and SQL for data transformation, automation, and pipeline logic
  • Experience with Apache Kafka, including stream ingestion and event processing
  • Experience with schema evolution, data versioning, and validation frameworks (Delta Lake, Iceberg, or Great Expectations)
  • Strong problem-solving skills and ability to debug and troubleshoot complex data-related issues
  • Strong communication skills and ability to work effectively across teams

Nice to have

  • Experience with real-time analytics frameworks (Spark Structured Streaming, Flink, Kafka Streams)
  • Exposure to hardware, firmware, or embedded telemetry environments
  • Experience with data cataloging or governance tools (DataHub, Collibra, or Alation)
  • Hands-on experience building or deploying AI Agents or LLM-based applications

What the JD emphasized

  • AI Agents
  • data pipelines
  • Apache Spark
  • Kafka
  • Databricks

Other signals

  • AI Agents
  • data pipelines
  • Databricks
  • Spark
  • Kafka