Edge ML Software Engineer (compiler-pico) - San Jose

ByteDance · Big Tech · San Jose, CA · R&D

Software Engineer specializing in ML compilers for edge NPU architectures, focusing on optimizing latency, memory, power, and thermal constraints for ML inference on target hardware. Requires strong compiler and deep learning model understanding, with preferred experience in quantization and ML compiler stacks.

What you'd actually do

Design and implement ML compiler for proprietary custom edge NPU architectures that meet latency, memory and power targets.
Implement operator fusion, memory planning and target lowering passes that support both static and dynamic shape compilation flows.
Apply knowledge of hardware architecture to optimize latency, memory footprint and bandwidth, most importantly power and thermal constraints.
Work closely with architecture and runtime engineers to define, develop and debug ML inference on target hardware platforms.

Skills

Required

Compiler development
ML systems
Compiler fundamentals (IR design, graph transformations, scheduling, memory planning)
Deep learning model structures (CNNs, Transformers)
Hardware concepts (memory, cache, DMA, tiling, vectorization, systolic array)
C/C++ or Rust proficiency

Nice to have

Quantization concepts
ML compiler stacks (torch.compile, MLIR, XLA, IREE, TVM)

What the JD emphasized

ML compiler
edge NPU
ML inference

Other signals

ML compiler for edge NPU
optimize latency, memory, power
ML inference on target hardware

Read full job description

As a world-renowned VR/AR brand with independent innovation and R&D capabilities, PICO has been at the forefront of the consumer electronic market. We have teams in Europe, Japan and South Korea. Now we are looking for experts in image pipeline to join us to build our AR/VR imaging team.

Responsibilities:

Design and implement ML compiler for proprietary custom edge NPU architectures that meet latency, memory and power targets.
Implement operator fusion, memory planning and target lowering passes that support both static and dynamic shape compilation flows.
Apply knowledge of hardware architecture to optimize latency, memory footprint and bandwidth, most importantly power and thermal constraints.
Work closely with architecture and runtime engineers to define, develop and debug ML inference on target hardware platforms.

Requirements

Minimum Qualifications

Master's degree in Computer Science, Electrical Engineering, Computer Engineering, or a related field, or equivalent practical experience.
3+ years of industry experience in compiler development, ML systems.
Solid understanding of compiler fundamentals: IR design, graph transformations, scheduling, memory planning.
Strong understanding of deep learning model structures of both CNNs and Transformers.
Understanding of hardware concepts: memory, cache, DMA, tiling, vectorization, systolic array, etc.
Strong C/C++ or Rust proficiency.

Preferred Qualifications

5+ years of relevant industry experience.
Understanding of quantization concepts (e.g. Observers, static/dynamic quantization).
Experience with ML compiler stacks such as torch.compile, MLIR, XLA, IREE or TVM.