Master of Science (MS)
CNN, Benchmark Suite, Embedded devices, Cuda, OpenCL, Deep Neural Network
Convolutional Neural Network (CNN) has been used widely for the tasks of object recognition and facial recognition because of their remarkable results on these common visual tasks. In order to evaluate the performance of CNN for embedded devices effectively, it is essential to provide a comprehensive benchmark evaluation environment. Even though there are many benchmark suites available for use, but these benchmark suites require installation of various packages and proprietary libraries. This creates a bottleneck in using them in applications which are executed on resource constraint devices like embedded devices.
In this paper, we propose an evaluation platform which can be used for evaluation on any platform that supports Cuda and OpenCL. This evaluation platform was executed on Nvidia TX2 Jetson board embedded device and commodity hardware without needing any extra proprietary libraries to execute the model. We also achieved 4.5-fold gain in execution speed of the Cuda and OpenCL model. The model also exactly predicts images as the Python based with 100% accuracy. We also provide in-depth statistics about the CNN network execution pattern by executing the model on embedded devices and commodity hardware.
Lahoti, Nikhil, "Low Power MobileNets Acceleration In Cuda And OpenCL" (2019). Master's Projects. 680.