What you'd actually do

building fault-tolerant infrastructure that scales across thousands of GPUs and TPUs with near-linear performance

engineering data pipelines that stream terabytes per second as our models train on petabytes of data from every corner of the global markets

designing custom kernels that unlock 10x efficiency gains

Co-designing novel architectures with researchers and pioneering cutting-edge approaches to mixed-precision training and model parallelism

Skills

Required

Expertise and track record of significant, measurable performance improvements in large-scale distributed training (MFU, throughput, convergence, cost-per-token)
Published research in efficient training methods, scaling laws, architectures, or systems for ML
Background in numerical computing, HPC, or distributed systems, including familiarity with GPUs/TPUs, high-performance networking (NVLink/InfiniBand), Kubernetes/Slurm, and OS internals
Expertise in Python and deep experience with modern deep learning frameworks (PyTorch and/or JAX)
Advanced degree (MS or PhD) in Computer Science, Machine Learning, Physics, Mathematics, or a related quantitative field, or equivalent industry experience at a frontier lab
Ability to balance ambitious research goals with practical engineering constraints
Strong problem-solving skills, results orientation, and excellent collaborative communication
Reliable and predictable availability

Nice to have

Expertise in: CUDA kernel development, Triton/Pallas/CuTe DSLs, PyTorch/JAX internals, XLA optimization, or hardware acceleration (FPGA/ASIC)
Knowledge of reinforcement learning, post-training, or fine-tuning techniques
Knowledge of financial markets or trading

What the JD emphasized

massive-scale foundation models

pre-training at scale

thousands of GPUs and TPUs

terabytes per second

petabytes of data

10x efficiency gains

mixed-precision training

model parallelism

published research in efficient training methods, scaling laws, architectures, or systems for ML

Expertise and track record of significant, measurable performance improvements in large-scale distributed training (MFU, throughput, convergence, cost-per-token)

Jump Trading Group is committed to world-class research. We empower exceptional talents in Mathematics, Physics, and Computer Science to seek scientific boundaries, push through them, and apply cutting-edge research to global financial markets. Our culture is unique. Constant innovation requires fearlessness, creativity, intellectual honesty, and a relentless competitive streak. We believe in winning together and unlocking unique individual talent by incentivizing collaboration and mutual respect. At Jump, research outcomes drive more than superior risk-adjusted returns. We design, develop, and deploy technologies that change our world, fund start-ups across industries, and partner with leading global research organizations and universities to solve problems.

Our team is a group of quantitative researchers, engineers, and ML experts leading foundation model research and trading at Jump. Our mission is to combine emerging techniques and original research to generate signals from financial market data and monetize it globally. We are building the future of ML-powered trading through breakthrough foundation models, and we're looking for an exceptional Pre-Training Engineer to join our team.

What You'll Do:

As a Pre-Training Research Engineer, you'll be at the forefront of developing massive-scale foundation models that fundamentally transform how we understand and predict markets, where milliseconds matter and no playbook exists. You'll own and drive the entire training stack: building fault-tolerant infrastructure that scales across thousands of GPUs and TPUs with near-linear performance, engineering data pipelines that stream terabytes per second as our models train on petabytes of data from every corner of the global markets, and designing custom kernels that unlock 10x efficiency gains. Co-designing novel architectures with researchers and pioneering cutting-edge approaches to mixed-precision training and model parallelism, you'll have the latest generation hardware at your disposal. This isn't incremental optimization; we're pushing the boundaries of what's possible in pre-training at scale, where your improvements directly impact live trading.

Other duties as assigned or needed.

Skills You'll Need:

Expertise and track record of significant, measurable performance improvements in large-scale distributed training (MFU, throughput, convergence, cost-per-token).
Published research in efficient training methods, scaling laws, architectures, or systems for ML
Background in numerical computing, HPC, or distributed systems, including familiarity with GPUs/TPUs, high-performance networking (NVLink/InfiniBand), Kubernetes/Slurm, and OS internals
Expertise in Python and deep experience with modern deep learning frameworks (PyTorch and/or JAX)
Advanced degree (MS or PhD) in Computer Science, Machine Learning, Physics, Mathematics, or a related quantitative field, or equivalent industry experience at a frontier lab
Ability to balance ambitious research goals with practical engineering constraints
Strong problem-solving skills, results orientation, and excellent collaborative communication
Reliable and predictable availability

Bonus Points:

Expertise in: CUDA kernel development, Triton/Pallas/CuTe DSLs, PyTorch/JAX internals, XLA optimization, or hardware acceleration (FPGA/ASIC)
Knowledge of reinforcement learning, post-training, or fine-tuning techniques
Knowledge of financial markets or trading

Benefits

- Discretionary bonus eligibility

Medical, dental, and vision insurance
HSA, FSA, and Dependent Care options
Employer Paid Group Term Life and AD&D Insurance
Voluntary Life & AD&D insurance
Paid vacation plus paid holidays
Retirement plan with employer match
Paid parental leave
Wellness Programs

Annual Base Salary Range

$300,000—$350,000 USD