稀疏递归神经网络的可扩展低功耗加速器

金磐石; 李俊杰; 王静逸; 李鹏翀; 邢磊; 李晓栋

doi:10.11959/j.issn.2096-8930.2023045

您当前的位置：

首页 >

文章列表页 >

稀疏递归神经网络的可扩展低功耗加速器

应用 | 更新时间：2024-06-03

- 稀疏递归神经网络的可扩展低功耗加速器
- Scalable Low Power Accelerator for Sparse Recurrent Neural Network
- 天地一体化信息网络 2023年4卷第4期页码：79-85
- 作者机构：
  
  1. 中国建设银行股份有限公司，北京 100034
  2. 建信金融科技有限责任公司，上海 321004
  3. 浪潮电子信息产业股份有限公司，山东济南 250000
- 作者简介：
  
  [ "金磐石（1965- ），男，中国建设银行股份有限公司首席信息官，主要从事信息技术系统战略策划、规划、协调和实施工作" ]
  [ "李俊杰（1978- ），男，现就职于建信金融科技有限责任公司，主要从事人工智能推理技术研究工作" ]
  [ "王静逸（1990- ），男，现就职于建信金融科技有限责任公司，主要从事人工智能在金融科技领域的应用研究工作" ]
  [ "李鹏翀（1981- ），男，浪潮电子信息产业股份有限公司网络研发部总经理，主要从事数据中心架构研究工作" ]
  [ "邢磊（1981- ），男，建信金融科技有限责任公司基础技术中心副总裁，主要从事分布式架构的设计研发工作" ]
  [ "李晓栋（1982- ），男，中国建设银行股份有限公司金融科技部技术架构管理处副处长，主要从事技术架构设计工作" ]
- 基金信息：
- DOI：10.11959/j.issn.2096-8930.2023045
  中图分类号： TP393
- 网络出版日期：2023-12，
  
  纸质出版日期：2023-12-20
- 稿件说明：
移动端阅览
金磐石, 李俊杰, 王静逸, 等. 稀疏递归神经网络的可扩展低功耗加速器[J]. 天地一体化信息网络, 2023,4(4):79-85.

Panshi JIN, Junjie LI, Jingyi WANG, et al. Scalable Low Power Accelerator for Sparse Recurrent Neural Network[J]. Space-integrated-ground information networks, 2023, 4(4): 79-85.
金磐石, 李俊杰, 王静逸, 等. 稀疏递归神经网络的可扩展低功耗加速器[J]. 天地一体化信息网络, 2023,4(4):79-85. DOI： 10.11959/j.issn.2096-8930.2023045.

Panshi JIN, Junjie LI, Jingyi WANG, et al. Scalable Low Power Accelerator for Sparse Recurrent Neural Network[J]. Space-integrated-ground information networks, 2023, 4(4): 79-85. DOI： 10.11959/j.issn.2096-8930.2023045.

摘要

利用银行网点内边缘计算设备进行客流分析、安全保护、风险防控等应用日益广泛，其中 AI 推理芯片的性能和功耗已经成为边缘计算设备选型的一个非常重要的因素。针对递归神经网络由数据依赖性和低数据重用性导致的功耗大、推理性能弱、能效低，难以在低功耗平台上处理等问题，利用FPGA实现了一种电压可扩展的稀疏循环神经网络（RNN）低功率加速器，并在边缘设计算设备上进行了验证。首先，对稀疏RNN进行分析并采用网络压缩的方法设计了处理阵列；其次，由于稀疏RNN的工作负载不平衡，引入电压缩放方法以保持低功耗和高吞吐量。试验表明，该方法可以显著提高系统的RNN 推理速度并降低芯片的处理功耗。

Abstract

The use of edge computing devices in bank outlets for passenger flow analysis

security protection

risk prevention and control is increasingly widespread

among which the performance and power consumption of AI reasoning chips have become a very important factor in the selection of edge computing devices.Aiming at the problems of recurrent neural network

such as high power consumption

weak reasoning performance and low energy efficiency

which were caused by data dependence and low data reusability

this paper realized a sparse RNN low-power accelerator with scalable voltage by using FPGA

and verifies it on the edge design and calculation equipment.Firstly

the sparse -RNN was analyzed and the processing array was designed by network compression.Secondly

due to the unbalanced workload of sparse RNN

it introduced voltage scaling method to maintain low power consumption and high throughput.Experiments show that this method could significantly improve the RNN reasoning speed of the system and reduce the processing power consumption of the chip.

关键词

Keywords

references

JITENDRA , KUMAR . Long short term memory recurrent neural network (LSTM-RNN) based workload forecasting model for cloud datacenters [J ] . Procedia Computer Science , 2018 , 125 : 676 - 682 .

RAHMAN M A , AHMED F , ALI N . Contextual deep search using long short term memory recurrent neural network [C ] // Proceedings of 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST) . Piscataway:IEEE Press , 2019 : 39 - 42 .

HAN S , LIU X Y , MAO H Z , et al . EIE:efficient inference engine on compressed deep neural network [C ] // Proceedings of 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) . Piscataway:IEEE Press , 2016 : 243 - 254 .

HAN S , MAO H , DALLY W J . Deep compression:compressing deep neural networks with pruning,trained quantization and huffman coding [J ] . Fiber , 2016 , 56 ( 4 ): 3 - 7 .

DORRANCE R , REN F B , MARKOVIĆ D . A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs [C ] // Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays . New York:ACM , 2014 : 161 - 170 .

PAL S , PARK D H , FENG S Y , et al . A 7.3 M output non-zeros/J sparse matrix-matrix multiplication accelerator using memory reconfiguration in 40 nm [C ] // Proceedings of 2019 Symposium on VLSI Circuits . Piscataway:IEEE Press , 2019 : 150 - 151 .

CHAKRABORTY S , BANIK J , ADDHYA S , et al . Study of De eration models [C ] // Proceedings of 2020 International Conference on Computer Science,Engineering and Applications (ICCSEA) . Piscataway:IEEE Press , 2020 : 1 - 5 .

MIAO Y J , GOWAYYED M , METZE F . EESEN:End-to-end speech recognition using deep RNN models and WFST-based decoding [C ] // Proceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) . Piscataway:IEEE Press , 2015 : 167 - 174 .

WANG D , ZHANG X . THCHS-30:a free Chinese speech corpus [EB ] . 2015 .

浏览量

134

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据