Hpc Engineer

Weights & Biases Weights & Biases · Data AI · New York, NY · Technology

This role focuses on supporting large-scale data center deployments of NVLink systems, involving hardware and software lifecycle management, building automation tooling, and troubleshooting complex network and server issues. It requires strong networking fundamentals, Linux administration, and proficiency in a scripting language like Python or Go.

What you'd actually do

  1. Support the deployment of NVLink systems across large data center environments.
  2. Support the full lifecycle management of NVLink hardware and software components.
  3. Build and maintain tooling to automate and streamline the deployment, monitoring and troubleshooting workflows.
  4. Diagnose and resolve performance, connectivity and stability issues in complex environments.
  5. Collaborate with internal teams and external customers worldwide.

Skills

Required

  • NVLink systems support
  • Data center deployment
  • Hardware and software lifecycle management
  • Automation tooling
  • Network troubleshooting
  • Server hardware troubleshooting
  • Linux system administration
  • Python
  • Go
  • Ansible
  • Complex application troubleshooting
  • Communication skills
  • Collaboration skills

Nice to have

  • InfiniBand networking
  • Large-scale environment management
  • Redfish API
  • NVUE
  • SONiC
  • Grafana
  • PromQL

What the JD emphasized

  • NVLink systems
  • networking fundamentals
  • Linux system administration
  • Python
  • Go
  • Ansible
  • InfiniBand networking
  • Grafana/PromQL