Convolutional neural networks (CNNs) require significant computing power during inference. Smart phones, for example, may not run a facial recognition system or search algorithm smoothly due to the lack of resources and supporting hardware. Methods for reducing memory size and increasing execution speed have been explored, but choosing effective techniques for an application requires extensive knowledge of the network architecture. This paper proposes a general approach to preparing a compressed deep neural network processor for inference with minimal additions to existing microprocessor hardware. To show the benefits to the proposed approach, an example CNN for synthetic aperture radar target classification is modified and complimentary custom processor instructions are designed. The modified CNN is examined to show the effects of the modifications and the custom processor instructions are profiled to illustrate the potential performance increase from the new extended instructions.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Joshua Misko, Shrikant S. Jadhav, and Youngsoo Kim. "Extensible Embedded Processor for Convolutional Neural Networks" Scientific Programming (2021). https://doi.org/10.1155/2021/6630552