Software Engineer, Inference Ai/ml

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +1 · Technology

Software Engineer focused on improving the latency, reliability, and cost of model serving on a GPU platform, working with services like Triton, vLLM, and TensorRT-LLM.

What you'd actually do

  1. Implement well-scoped features and fixes in Python/Go/C++ for model-serving services (e.g., Triton, vLLM, TensorRT-LLM, Ray Serve).
  2. Write tests, code comments, and short design docs; participate in code reviews.
  3. Add basic metrics and dashboards; assist with alarms and runbooks.
  4. Follow on-call runbooks and learn incident response in a guided rotation.
  5. Contribute to performance experiments (e.g., request batching, concurrency, caching) with guidance.

Skills

Required

  • Python
  • Go
  • Linux fundamentals
  • data structures
  • algorithms
  • networked services

Nice to have

  • C++
  • Kubernetes
  • PyTorch
  • TensorFlow
  • CUDA
  • Grafana
  • Prometheus
  • OpenTelemetry

What the JD emphasized

  • improve latency, reliability, and cost for model serving

Other signals

  • improve latency, reliability, and cost for model serving
  • implement well-scoped changes
  • grow quickly with mentorship