Research Engineer, Pre-training

Jump Trading Jump Trading · Quant · Chicago, IL +1 · Front Office

Research Engineer focused on pre-training massive-scale foundation models for financial markets, involving building fault-tolerant infrastructure, engineering data pipelines, and designing custom kernels for efficiency. Requires expertise in large-scale distributed training, published research in efficient training methods, and proficiency in Python and deep learning frameworks.

What you'd actually do

  1. building fault-tolerant infrastructure that scales across thousands of GPUs and TPUs with near-linear performance
  2. engineering data pipelines that stream terabytes per second as our models train on petabytes of data from every corner of the global markets
  3. designing custom kernels that unlock 10x efficiency gains
  4. Co-designing novel architectures with researchers and pioneering cutting-edge approaches to mixed-precision training and model parallelism

Skills

Required

  • Expertise and track record of significant, measurable performance improvements in large-scale distributed training (MFU, throughput, convergence, cost-per-token)
  • Published research in efficient training methods, scaling laws, architectures, or systems for ML
  • Background in numerical computing, HPC, or distributed systems, including familiarity with GPUs/TPUs, high-performance networking (NVLink/InfiniBand), Kubernetes/Slurm, and OS internals
  • Expertise in Python and deep experience with modern deep learning frameworks (PyTorch and/or JAX)
  • Advanced degree (MS or PhD) in Computer Science, Machine Learning, Physics, Mathematics, or a related quantitative field, or equivalent industry experience at a frontier lab
  • Ability to balance ambitious research goals with practical engineering constraints
  • Strong problem-solving skills, results orientation, and excellent collaborative communication
  • Reliable and predictable availability

Nice to have

  • Expertise in: CUDA kernel development, Triton/Pallas/CuTe DSLs, PyTorch/JAX internals, XLA optimization, or hardware acceleration (FPGA/ASIC)
  • Knowledge of reinforcement learning, post-training, or fine-tuning techniques
  • Knowledge of financial markets or trading

What the JD emphasized

  • massive-scale foundation models
  • pre-training at scale
  • thousands of GPUs and TPUs
  • terabytes per second
  • petabytes of data
  • 10x efficiency gains
  • mixed-precision training
  • model parallelism
  • published research in efficient training methods, scaling laws, architectures, or systems for ML
  • Expertise and track record of significant, measurable performance improvements in large-scale distributed training (MFU, throughput, convergence, cost-per-token)

Other signals

  • foundation model research
  • pre-training at scale
  • transform how we understand and predict markets