Sr. Software Engineer - Performance

Databricks Databricks · Data AI · San Francisco, CA · Engineering - Pipeline

Databricks is seeking a Sr. Software Engineer focused on performance to join their team. This role involves evaluating product performance, identifying bottlenecks, and collaborating with engineers to solve performance and scalability issues across their large-scale data and AI infrastructure platform. Responsibilities include setting performance targets, developing benchmarks, analyzing competitive performance, and mitigating performance problems for customers.

What you'd actually do

  1. Identify performance limitations of the entire stack based on telemetry, customer signals, PoCs, and competitive benchmarks, that will result in the best performing system across the industry, when resolved. Dimensions include latency, data and compute scalability, concurrency, cost, and price to performance ratio. Impact spans all cloud providers and all major areas.
  2. Set the performance expectations for all cross-cutting efforts early on through specialized benchmarks capturing the intended customer user journeys, and make sure they are met before deployed to customers.
  3. Understand the performance characteristics of the compute instance types, storage layers, and all cloud services Databricks depends on and deploy optimal solutions to meet the customer demand.
  4. Work with customers to root cause and mitigate performance problems during production, previews, and POCs.

Skills

Required

  • BS (or higher degree) in Computer Science, or a related field
  • Experience in the performance analysis discipline. Ability to identify performance issues, root cause problems, and be able to come up with potential solutions.
  • Experience in software development, preferably in large scale distributed systems
  • Ability to measure and document the impact of performance features to existing customers, such as possible regressions for certain workloads, their extent, and which customers will be affected.
  • Ability to build strong working relationships with developers and field engineers to facilitate triaging and mitigation of performance problems.

What the JD emphasized

  • performance
  • scalability
  • performance analysis discipline
  • large scale distributed systems