2020 Scylla

INFOCOM’20: SCYLLA: QoE-aware Continuous Mobile Vision with FPGA-based Dynamic Deep Neural Network Reconfiguration.

Overview

In one sentence: Use FPGA for fast switching between different neural network models.

Problem: multiple neural network models are hard to be made efficient to run on the same device concurrently.

  • Heterogeneous multi-tenacy
  • GPU (SIMT) requires several seconds to switch from one neural network model to another;
  • ASIC AI chips, not designed for concurrency and heterogeneity.

Solution: Use FPGA to run several neural network models.

  • It can fit several models into one FPGA board;
  • The model running on FPGA can also be replaced with new models very quickly (in 80-90 ms, Xilinx ZCU102)

Challenges:

  • Hard to fast switch different QoE profiles; Addressed by:
    • Pre-generates a pool of FPGA design and DNN model profiles
    • Dynamically re-configure FPGA
  • Hard to optimize overall performance across multipel cocurrently running tasks; Solution:
    • Encode QoE metrics together
    • Use a QoE-aware scheduler
    • Select the “optimal” software-hardware configuration and achive best QoE.

Key techniques

  • Run Multiple DNN models on FPGA and reconfigure FPGA to replace models on the fly.

Evaluation

Device: Xilinx ZCU102 board ($2495)

Three DNN Models (Design 1, 2, 3, No name?), use generic convolution kernels with differnt parallelism.

  • Based on CHaiDNN from Xilinx;
  • Evaluated the FPGA resource usage, DNN model accuracy, and energy cost.

Task scheduling evaluation:

  • Written in C++, run on ARM co-processor;
  • Three tasks: Object Detection, License Plate Recongnition, Car Type Classification.
  • Three optimize sub-goals: T (time/latency), A (Accuracy), E (Energy)
  • Keep one goal fixed, exploring other two goals.
  • Latency bound experiment: SCYLLA has more applications meet deadlines.
  • Compared with CPU solution (Caffe framework).

Can we do it?

Lele: probably can since this is a more system-like paper, and most concepts are understandable for me. The key challenge here would be the FPGA programming experience which I only have a little bit and not sure how long it will take to run a DNN model on it.

Questions or new ideas?

Lele: No matter how good is the novelty you guys think, I think this style of work is somehow fit my experience very well – It uses the ML algorithms as a black box, instead of trying to update the algorithm itself. So from this sense, this paper is different from NAS related paper we have read, where NAS algorithms are changed as their novelty.

More

Created Nov 19, 2020 // Last Updated Aug 31, 2021

If you could revise
the fundmental principles of
computer system design
to improve security...

... what would you change?