Master of Science (MS)
Deep Learning, Energy-efficiency, GPU
Graphical processing units (GPUs) achieve high throughput with hundreds of cores for concurrent execution and a large register file for storing the context of thousands of threads. Deep learning algorithms have recently gained popularity for their capability for solving complex problems without programmer intervention. Deep learning algorithms operate with a massive amount of input data that causes high memory access overhead. In the convolutional layer of the deep learning network, there exists a unique pattern of data access and reuse, which is not effectively utilized by the GPU architecture. These abundant redundant memory accesses lead to a significant power and performance overhead. In this thesis, I maintained redundant data in a faster on-chip memory, register file, so that the data that are used by multiple neurons can be directly fetched from the register file without cumbersome system memory accesses. In this method, a neuron’s load instruction is replaced by a shuffle instruction if the data are found from the register file. To enable data sharing in the register file, a new register type was used as a destination register of load instructions. By using the unique ID of the new load destination registers, neurons can easily find their data in the register file. By exploiting the underutilized register file space, this method does not impose any area or power overhead on the register file design. The effectiveness of the new idea was evaluated through exhaustive experiments. According to the results, the new idea significantly improved performance and energy efficiency compared to baseline architecture and shared memory version solution.
Gopalakrishnan Elango, Sindhuja, "Convolutional Neural Network Acceleration on GPU by Exploiting Data Reuse" (2017). Master's Theses. 4800.