Adaptive Model Quantization Method for Intelligent Internet of Things Terminal
Abstract
With the rapid development of deep learning and the Internet of Everything,the combination of deep learning and mobile terminal devices has become a major research hotspot.While deep learning improves the performance of terminal devices,it also faces many challenges when deploying models on resource-constrained terminal devices,such as the limited computing and storage resources of terminal devices,and the inability of deep learning models to adapt to changing device context.We focus on the adaptive quantization of deep models with resource adaptive.Specifically,a resource-adaptive mixed-precision model quantization method is proposed,which firstly uses the gated network and the backbone network to construct the model and partitioned model at layer as the granularity to find the best quantization policy of the model,and combines the edge devices to reduce the model resource consumption.In order to find the optimal model quantization policy,FPGA-based deep learning model deployment is adopted.When the model needs to be deployed on resource-constrained edge devices,adaptive training is performed according to resource constraints,and a quantization-aware method isadopted to reduce the accuracy loss caused by model quantization.Experimental results show that our method can reduce the storage space by 50% while retaining 78% accuracy,and reduce the energy consumption by 60% on the FPGA device with no more than 2% accuracy loss.
Date
01-11-2023Author
WANG Yuzhan, GUO Bin, WANG Hongli, LIU Sicong
