Edge ML Software Engineer (compiler-pico) - San Jose

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

Software Engineer specializing in ML compilers for edge NPU architectures, focusing on optimizing latency, memory, power, and thermal constraints for ML inference on target hardware. Requires strong compiler and deep learning model understanding, with preferred experience in quantization and ML compiler stacks.

What you'd actually do

  1. Design and implement ML compiler for proprietary custom edge NPU architectures that meet latency, memory and power targets.
  2. Implement operator fusion, memory planning and target lowering passes that support both static and dynamic shape compilation flows.
  3. Apply knowledge of hardware architecture to optimize latency, memory footprint and bandwidth, most importantly power and thermal constraints.
  4. Work closely with architecture and runtime engineers to define, develop and debug ML inference on target hardware platforms.

Skills

Required

  • Compiler development
  • ML systems
  • Compiler fundamentals (IR design, graph transformations, scheduling, memory planning)
  • Deep learning model structures (CNNs, Transformers)
  • Hardware concepts (memory, cache, DMA, tiling, vectorization, systolic array)
  • C/C++ or Rust proficiency

Nice to have

  • Quantization concepts
  • ML compiler stacks (torch.compile, MLIR, XLA, IREE, TVM)

What the JD emphasized

  • ML compiler
  • edge NPU
  • ML inference

Other signals

  • ML compiler for edge NPU
  • optimize latency, memory, power
  • ML inference on target hardware