Hpc Operations Engineer

Jump Trading Jump Trading · Quant · London, United Kingdom · IT Infrastructure + WCW

Jump Trading is seeking an HPC Operations Engineer to manage and support their Linux HPC compute, storage, and interconnects. This role involves front-line operational support, problem-solving for researchers, responding to alerts, participating in maintenance, and writing code for diagnostics and automation. The ideal candidate has 2+ years of Linux systems administration experience and proficiency in programming languages like Go, Python, or C. Experience with HPC technologies is a plus.

What you'd actually do

  1. Provide front-line operational support for 24/7 Linux HPC compute, storage, and interconnects. Technologies involved include RDMA fabrics, parallel filesystems, HPC batch schedulers, FUSE filesystems, internal Jump software, multi-vendor hardware, cybersecurity requirements, a challenging and unpredictable client workload, and high user expectations
  2. Solve problem reports and questions posed by members of Jump's research community, escalating as needed and managing the entire problem lifecycle.
  3. Respond to alerts in a timely fashion
  4. Participate in large, coordinated maintenance operations, including during evenings and weekends
  5. Work on global projects across a wide range of infrastructure

Skills

Required

  • Linux systems administration
  • programming/scripting language (e.g., Go, Python, C)
  • root cause analysis
  • verbal and written communication skills
  • collaboration skills
  • independently manage complex projects
  • sense of urgency
  • willingness to perform regular operational maintenance work during evenings and weekends

Nice to have

  • High performance computing (HPC), including parallel filesystems (e.g., Lustre, GPFS), batch systems (e.g., Slurm, Grid Engine), and high-performance network interconnects experience

What the JD emphasized

  • primary job function
  • operational work as primary job function