本文已被:浏览 1035次 下载 346次
投稿时间:2021-04-10 修订日期:2021-12-17
投稿时间:2021-04-10 修订日期:2021-12-17
中文摘要: 针对商用中央处理单元(central processing unit,CPU)的专用许可证授权费用高和卷积神经网络性能待提升等问题,设计了一种基于多视图并行且具有可配置性的卷积神经网络加速器,同时结合第五代精简指令集(reduced instruction set computing, RISC?V)构建该加速器的片上系统。首先,扩展一组适用高速协加速器的控制访问接口和数据访问接口。其次,以多视图并行与结构复用的方式实现卷积神经网络各运算单元:视图并行的不同组合将影响卷积单元硬件电路结构,因此多视图并行可通过复用基本运算结构来完成;池化单元由行池化和列池化子单元构成,且共享行池化的运算结构;对于全连接单元,采用调整全连接运算参数的方法来适应卷积单元的硬件结构,从而完成模型间的复用。然后,针对不同运算单元的硬件结构设计不同寄存器组,并结合开源RISC?V处理器实现多种网络模型。最后,在不同平台分别部署卷积、池化和全连接模型,计算运算时间、吞吐量和速度等。实验结果表明,对于相同卷积结构,本文设计的加速器和CPU平台的速度比是189。在本文设计的加速器中部署视觉几何组(visual geometry group,VGG)的卷积运算,其吞吐量可达178.6 GOPS。综上所述,利用多视图并行能够达到加速效果,且以配置寄存器方式可实现不同网络模型。
Abstract:In order to solve the high expenditure of special license authorization for commercial central processing unit (CPU) and the improving performance of convolutional neural networks (CNNs), a configurable CNN accelerator was proposed based on multi-view parallelism. And it constructed the system on chip (SoC) with RISC?V (Reduced instruction set computing). Firstly, a set of interfaces incorporated control access bus and data access bus were expanded for high-speed accelerators. Secondly, each operation unit for CNNs was implemented by both multi-view parallelism and structure multiplexing. The hardware circuit structure for convolution unit would be affected by the different combination of view parallelism, thus multi-view parallelism was accomplished by reusing the basic arithmetic structure. The pooling unit was composed of row pooling and column pooling submodules, and which shared the structure of row pooling operation. For the fully connected, a method that parameters could be processed was employed to adapt to the hardware structure of convolution unit, thereby completing reuse between models. Then, different registers were designed for the hardware structure of different computing units, and combined with the open source RISC?V processor to realize multiple CNN models. Finally, different operation units, such as convolution, pooling and fully connected models, were respectively deployed on different platforms to calculate the latency, throughput and speedup ratio. The experimental results demonstrated that the speedup ratio of the same convolution structure on the designed accelerator and CPU platform was 189. The convolution operation of Visual Geometry Group was deployed in the designed accelerator, and the throughput could reach 178.6 GOPS. To sum up, an acceleration effect could be realized by multi-view parallelism, and different CNN models was able to be implemented by configuring registers.
keywords: convolutional neural network multi-view parallelism configurable system on chip reuse RISC?V
文章编号:202100299 中图分类号:TP391 文献标志码:
基金项目:四川省重大科技专项(2018GZDZX0024);四川省科技计划项目(2020YFG0288)
作者 | 单位 | |
应三丛 | 四川大学 计算机学院, 四川 成都 610065 | yingsancong@163.com |
彭铃 | 四川大学 视觉合成图形图像技术国防重点学科实验室, 四川 成都 610065 | pl2019031@163.com |
作者简介:第一作者:应三丛(1975-),男,副教授,博士.研究方向:智能信息处理.E-mail:yingsancong@163.com;通信作者:彭铃,E-mail:pl2019031@163.com
引用文本:
应三丛,彭铃.基于多视图并行的可配置卷积神经网络加速器设计[J].工程科学与技术,2022,54(2):188-195.
YING Sancong,PENG Ling.Configurable Convolutional Neural Network Accelerator Based on Multi-view Parallelism[J].Advanced Engineering Sciences,2022,54(2):188-195.
引用文本:
应三丛,彭铃.基于多视图并行的可配置卷积神经网络加速器设计[J].工程科学与技术,2022,54(2):188-195.
YING Sancong,PENG Ling.Configurable Convolutional Neural Network Accelerator Based on Multi-view Parallelism[J].Advanced Engineering Sciences,2022,54(2):188-195.