Production Engineer, Support Tooling (tooling and Frameworks)

Weights & Biases Weights & Biases · Data AI · Bellevue, WA +3 · Technology

This role focuses on building and operating internal tooling for customer support, specifically developing AI-assisted workflows and AI-powered assistants to improve case triage, intelligent routing, knowledge retrieval, and resolution quality. The goal is to accelerate root-cause discovery and enhance customer experience.

What you'd actually do

  1. Design, build, and own support-facing tools for case triage, intelligent routing, and expert engagement, integrating with incident and change management workflows.
  2. Develop AI-powered assistants and automations that accelerate root-cause discovery, knowledge retrieval, and resolution quality.
  3. Create and maintain dashboards, alerts, and signals that surface tooling issues early; integrate observability into new tooling to reduce MTTR.
  4. Build self-service and guided diagnostics that empower Support/CX to resolve common issues and collect high-quality context for escalations.
  5. Codify reliability and support practices into services, APIs, and Kubernetes-native controllers/operators where appropriate.

Skills

Required

  • 4+ years of software or infrastructure engineering experience building and operating production services.
  • Proficiency in Go or Python (or equivalent experience).
  • Strong fundamentals in Linux, containers, and Kubernetes; comfortable debugging in distributed systems.
  • Experience with observability (metrics/logs/traces) and using data to improve reliability and support outcomes.
  • Demonstrated experience with incident management and steady‑state operational excellence (e.g., progressive delivery, testing strategies, error budgets, fault‑tolerant design).
  • Comfort collaborating with multiple stakeholders (Support/CX, Product, SRE, and service owners).

Nice to have

  • Experience integrating or building support/operations tooling (e.g., ticketing/incident systems, status page, knowledge management, chat/alerting integrations).
  • Experience automating manual workflows and stitching together productivity platforms.
  • Familiarity with AI/ML tooling for retrieval, summarization, or copilot‑style assistance.
  • Experience codifying operational practices into Kubernetes controllers, operators, or platform services.

What the JD emphasized

  • AI-assisted workflows
  • AI-powered assistants
  • retrieval
  • summarization
  • copilot-style assistance

Other signals

  • AI-assisted workflows
  • AI-powered assistants
  • retrieval
  • summarization
  • copilot-style assistance