References:
NeurIPS’20: MCUNet: Tiny Deep Learning on IoT Devices
NeurIPS’20: Tiny Transfer Learning: Towards Memory-Efficient On-Device Learning
Three benchmark datasets: Cars, Flowers, Aircraft
Devices: Raspberry Pi 1. 256MB of memory.
NeurIPS’20: Differentiable Augmentation for Data-Efficient GAN Training
ICLR’20: Once-for-all: Train one network and specialize it for efficient deployment.
ECCV’20: Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution.
ECCV’20: DataMix: Efficient Privacy-Preserving Edge-Cloud Inference.
ACL’20: HAT: Hardware-Aware Transformers for Efficient Natural Language Processing.
CVPR’20: GAN Compression: Efficient Architectures for Interactive Conditional GANs
Four datasets:
CPU or NVIDIA GPU + CUDA CuDNN
CVPR’20: APQ: Joint Search for Network Architecture, Pruning and Quantization Policy.
DAC’20: GCN-RL Circuit Designer: Transferable Transistor Sizing with Graph Neural Networks and Reinforcement Learning.
HPCA’20: SpArch: Efficient Architecture for Sparse Matrix Multiplication.
ICLR’20: Lite Transformer with Long Short Term Attention.
NeurIPS’19: Point Voxel CNN for Efficient 3D Deep Learning.
NeurIPS’19: Deep Leakage from Gradients.
ICCV’19: TSM: Temporal Shift Module for Efficient Video Understanding.
CVPR’19: HAQ: Hardware-Aware Automated Quantization with Mixed Precision.
ICLR’19: ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware.
ICLR’19: Defensive Quantization: When Efficiency Meets Robustness.
ECCV’18: AMC: AutoML for Model Compression and Acceleration on Mobile Devices.
ICLR’18: Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training.
ICLR’18: Efficient Sparse-winograd Convolutional Neural Networks.
ICLR’17: DSD: Dense-Sparse-Dense Training for Deep Neural Networks.
ICLR’17: Trained Tenary Quantization.
HotChips at MICRO’17: Software-Hardware Co-Design for Efficient Neural Network Acceleration.
EMDNN’16 FPGA’17: ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA.
O’Reilly, 2016: Compressing and Regularizing Deep Neural Networks, Improving Prediction Accuracy Using Deep Compression and DSD Training.
ISCA’16: EIE: Efficient Inference Engine on Compressed Deep Neural Network.
ICLR’16: Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.
NIPS’15: Learning both Weights and Connections for Efficient
ArXiv’16: SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size.
ISVLSI’16: Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware.
ICLR Workshop’16: Hardware-friendly Convolutional Neural Network with Even-number Filter Size., 4 pages, with Tsinghua Brain.
If you could revise
the fundmental principles of
computer system design
to improve security...
... what would you change?