Sr. Software Engineer - Perf and Benchmarking

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +1 · Technology

Senior Software Engineer focused on performance and benchmarking of AI infrastructure, including Kubernetes-native services, MLPerf runs, and model-serving stacks. The role involves building and improving services to measure latency, throughput, and cost, and ensuring reproducible benchmarking processes.

What you'd actually do

  1. Build and improve Kubernetes-native benchmarking services that measure latency, throughput, jitter, and cost-per-request across CoreWeave’s compute stack.
  2. Implement and maintain benchmarking workflows for end-to-end MLPerf Training and Inference runs, including workload setup, cluster configuration, runbooks, and result validation.
  3. Lead design reviews and drive architecture within the team; decompose multi-service work into clear milestones.
  4. Mentor junior engineers; review cross-team designs and elevate coding/testing standards.
  5. Help ensure reproducible, well-documented benchmarking processes.

Skills

Required

  • Python or Go
  • Kubernetes at production scale
  • CI/CD
  • observability stacks (Prometheus, Grafana, OpenTelemetry)
  • performance-critical GPU systems (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth)
  • model-serving stacks (llm-d, vLLM, TensorRT-LLM, Megatron-LM)

Nice to have

  • time-series databases
  • LSM-based storage engines
  • custom data pipelines
  • running MLPerf submissions
  • large-scale audited benchmarks
  • Contributions to OSS projects such as llm-d, vLLM or PyTorch
  • benchmarking large GPU fleets
  • multi-region clusters
  • CUDA kernels
  • NCCL/SHARP
  • RDMA/NUMA
  • GPU interconnect topologies

What the JD emphasized

  • performance benchmarking
  • MLPerf Training and Inference runs
  • Kubernetes-native benchmarking services
  • end-to-end performance benchmarking publications

Other signals

  • performance benchmarking
  • MLPerf
  • Kubernetes
  • GPU systems
  • model-serving stacks