Hpc Operations Engineer

Jump Trading Jump Trading · Quant · Mumbai, India +1 · IT Infrastructure + WCW

Jump Trading is seeking an HPC Operations Engineer to provide 24/7 operational support for their Linux HPC compute, storage, and interconnects. This role involves troubleshooting, maintenance, scripting, and vendor management to support quantitative research in financial markets.

What you'd actually do

  1. Provide front-line operational support for 24/7 Linux HPC compute, storage, and interconnects. Technologies involved include RDMA fabrics, parallel filesystems, HPC batch schedulers, FUSE filesystems, internal Jump software, multi-vendor hardware, cybersecurity requirements, a challenging and unpredictable client workload, and high user expectations.
  2. Solve problem reports and questions posed by members of Jump's research community, escalating as needed and managing the entire problem lifecycle.
  3. Respond to alerts in a timely fashion.
  4. Participate in large, coordinated maintenance operations, including during evenings and weekends.
  5. Work on global projects across a wide range of infrastructure.

Skills

Required

  • Linux systems experience
  • programming/scripting language proficiency (e.g., Go, Python, C)
  • root cause analysis
  • verbal and written communication skills
  • collaboration skills
  • independent project management

Nice to have

  • High performance computing (HPC)
  • parallel filesystems (e.g., Lustre, GPFS)
  • batch systems (e.g., Slurm, Grid Engine)
  • high-performance network interconnects

What the JD emphasized

  • operational work as primary job function
  • Provide front-line operational support
  • Provide operational support as primary job function
  • primary job function