Distinguished Engineer, Apache Spark

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

Distinguished Engineer role focused on accelerating Apache Spark using GPUs, involving architecture, design, and implementation of big-data frameworks. The role requires deep expertise in distributed systems, open-source contributions (Spark, Hadoop, etc.), and building CUDA/C++ libraries. It emphasizes collaboration with partners and industry presentation.

What you'd actually do

  1. Lead the architecture, design and implementation of accelerated Apache Spark and related big-data frameworks
  2. Engage open source communities (including Apache Spark, RAPIDS, Apache Iceberg, Delta Lake and UCX) for technical discussion and contribution, and engage new communities where we may not have a strong presence yet
  3. Work with NVIDIA partners to deploy GPU enabled data analytics solutions in public cloud or on-premises clusters
  4. Present technical solutions at industry conferences and meetups
  5. Collaborate with distributed systems teams to define solutions to distributed processing problems challenges at large scale

Skills

Required

  • BS, MS, or PhD in Computer Science, Computer Engineering, or closely related field (or equivalent experience)
  • 17+ years of work or research experience in software development
  • Prior experience in delivering complex software projects as a lead architect
  • Outstanding technical skills in designing and implementing high-quality distributed systems
  • Excellent programming skills in C++, Java, and/or Scala
  • Highly motivated with strong interpersonal skills and communication skills
  • 5+ years working experience with key open source big-data projects as a contributor or committer to Apache Spark, Apache Hadoop, Apache Flink, Apache Kafka, Apache Hive, Apache Arrow, Delta Lake
  • Excellent knowledge about distributed system schedulers: Kubernetes, Hadoop YARN, Apache Spark
  • Able to delve into a new area and quickly come up to speed
  • Able to work with teams across boundaries and geographies

Nice to have

  • Working experience in designing and developing columnar query engines
  • Committership at major open source projects (such as Apache Spark, Apache Hadoop, Apache Flink)
  • Working experience with acceleration libraries (CUDA, RAPIDS, UCX)

What the JD emphasized

  • 5+ years working experience with key open source big-data projects as a contributor or committer to Apache Spark, Apache Hadoop, Apache Flink, Apache Kafka, Apache Hive, Apache Arrow, Delta Lake
  • Excellent knowledge about distributed system schedulers: Kubernetes, Hadoop YARN, Apache Spark
  • Able to delve into a new area and quickly come up to speed
  • Able to work with teams across boundaries and geographies