High-performance Computing (hpc) (sa3) (government)

AT&T AT&T · Telecom · Columbia, MD

This role is for a High-Performance Computing (HPC) Systems Administrator supporting a large client-based IT enterprise installation, configuration, and networking of Linux and Windows based platforms. The position requires expertise in HPC, automated processing systems, distributed software design, and secure hosting and networking solutions, primarily on Linux systems with parallel file systems and high-speed interconnects. Responsibilities include system operations, maintenance, installation, configuration, automation via scripting, troubleshooting, and performance optimization.

What you'd actually do

  1. Linux-based HPC clusters (e.g., Red Hat/CentOS/Rocky/Ubuntu) with parallel file systems (e.g., Lustre/GPFS) and high-speed interconnects (InfiniBand/Slingshot).
  2. Transition of new systems/capabilities into operations (clusters, SMP/MPP, parallel file systems).
  3. Support to HPC and ABS (ABUNDANTSHIELD) SRE teams in accordance with Government policies and procedures.
  4. Operate and maintain systems/services: monitoring, incident response, troubleshooting, and routine maintenance.
  5. Install/configure Linux OS, file systems, and TCP/IP networking; troubleshoot OS and application issues.

Skills

Required

  • B.S. in a technical discipline and 10 years’ experience as a System Administrator
  • DoD 8570 IAT II level certification

Nice to have

  • HPC
  • Linux
  • Windows
  • UNIX
  • parallel file systems
  • high-speed interconnects
  • InfiniBand
  • Slingshot
  • Lustre
  • GPFS
  • Red Hat/CentOS/Rocky/Ubuntu
  • TCP/IP networking
  • BASH scripting
  • Jira
  • Confluence
  • Grafana
  • Prometheus
  • Nagios
  • Slurm
  • git
  • Salt
  • Ansible
  • system administration interdependencies
  • Service Oriented Architecture (SOA)

What the JD emphasized

  • TS/SCI with polygraph