Software Engineer, Model Performance Systems

Baseten · Data AI · San Francisco, CA · EPD

Software Engineer role focused on building and optimizing the performance of AI inference infrastructure, including benchmarking, hardware profiling, and developing automated testing and monitoring tools for LLMs.

What you'd actually do

  1. Run and automate standard LLM quality benchmarks (GSM8K, MMLU) alongside custom performance suites for specific workloads (e.g., long-context window, KV cache reuse).
  2. Create automated acceptance tests for new GPU clusters across x86 and ARM systems, measuring GPU memory bandwidth, networking throughput, and multi-node networking performance.
  3. Develop and maintain internal GPU-enabled development environments (similar to GitHub Codespaces). You will ensure the team has seamless, high-performance "dev machines" optimized for model experimentation.
  4. Build and contribute to tools such as InferenceMAX and genai-bench to automate model evaluation and optimization.
  5. Use PyTorch Profiler and NVIDIA Nsight Systems to collect performance profiles, identify bottlenecks, and debug the NVIDIA compute/networking stack.

Skills

Required

  • Python
  • Performance Benchmarking
  • Infrastructure Validation
  • GPU hardware
  • Networking performance
  • Model optimization

Nice to have

  • C++
  • Quantization
  • Speculative decoding
  • Disaggregated serving
  • Kernel-level optimizations

What the JD emphasized

  • GPU-enabled development environments
  • InferenceMAX
  • genai-bench
  • PyTorch Profiler
  • NVIDIA Nsight Systems

Other signals

  • building inference infrastructure
  • optimizing model performance
  • developing performance benchmarks