Edge ML Software Engineer (model Optimization-pico) - San Jose

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

Software Engineer focused on optimizing and deploying ML models for edge NPUs in VR/AR devices, involving quantization, performance profiling, and hardware-aware optimizations to meet latency, memory, and power constraints.

What you'd actually do

  1. Convert and compile ML models for execution on edge NPUs, and apply quantization mechanisms.
  2. Profile and analyze model performance and power consumption on simulators, emulators, and silicon platforms.
  3. Identify bottlenecks related to compute, memory bandwidth, data movement, and scheduling
  4. Apply hardware-aware optimization strategies, such as quantization, compression and operator fusion, to meet latency, memory and power targets.
  5. Work closely with algorithm, compiler, firmware and hardware teams to debug functional and performance issues.

Skills

Required

  • Python
  • C/C++
  • PyTorch
  • TensorFlow
  • deep learning architectures
  • CNNs
  • Transformers
  • ML accelerators' architectures
  • operator fusion
  • memory hierarchies
  • data movements

Nice to have

  • model inference constraints on edge devices
  • PTQ
  • QAT

What the JD emphasized

  • edge NPUs
  • quantization
  • model performance
  • latency
  • power targets

Other signals

  • ML models
  • edge NPUs
  • quantization
  • model performance
  • latency
  • power targets