Senior Software Engineer, Managed Spark, Opensource

Google Google · Big Tech · Sunnyvale, CA +1

Senior Software Engineer role focused on building customer-facing features for Managed Service for Apache Spark in the cloud. Responsibilities include driving technical design for performance and lakehouse features, enhancing Apache Spark and related technologies, contributing to documentation, and reviewing code. Requires experience in designing large-scale distributed systems, programming in Java/C++/Golang, and experience with Spark/Hive. Preferred qualifications include experience with data lakes, database optimizations, monitoring solutions, and open-source contributions.

What you'd actually do

  1. Build customer-facing features for Managed Service for Apache Spark (formerly Dataproc) to run Spark in the cloud.
  2. Drive technical design and execution for performance and lakehouse features and enhancements.
  3. Enhance Apache Spark and lakehouse technologies like Iceberg or Delta Lake for performance, reliability, security, and monitoring.
  4. Contribute to documentation or educational content based on product updates and user feedback, and extend open-source technologies like Apache Spark, Flink, Hive, and Trino to improve debuggability, observability, and supportability.
  5. Review code developed by other developers and provide feedback to ensure style guidelines, code check-in, accuracy, testability, and efficiency.

Skills

Required

  • 5 years of experience designing, analyzing and troubleshooting large-scale distributed systems.
  • 5 years of programming experience in Java, C++ or Golang.
  • Experience developing with Spark, Hive, or similar engines.
  • Experience in benchmarking and building custom benchmarks.
  • Experience in developing cloud or software as a service (SaaS) products.

Nice to have

  • Master’s degree or PhD in Computer Science or a related technical field.
  • Experience with Data lakes like Apache Iceberg, Apache Hudi, Delta lake etc.
  • Experience with Database optimizations - query and executor optimizations.
  • Experience working with data science tools such as Jupyter notebooks.
  • Experience with Open Telemetry, JMX and other monitoring solutions.
  • Contributions to Apache or other similar open-source projects such as Iceberg, Delta, Hudi, Spark, Presto, Flink etc.