Off-campus SJSU users: To download campus access theses, please use the following link to log into our proxy server with your SJSU library user name and PIN.
Publication Date
Spring 2021
Degree Type
Thesis - Campus Access Only
Degree Name
Master of Science (MS)
Department
Computer Engineering
Advisor
Simon Shim
Keywords
Agent57, Never Give Up, Recurrent Experience Replay in Distributed Reinforcement Learning, Reinforcement Learning
Subject Areas
Artificial intelligence
Abstract
Deep reinforcement learning (DRL) systems have transformed artificial intelligenceby solving complex decision-making problems. One such example is learning to play video games using visual sensory information. These DRL systems use deep learning methodology to process sensory information and a reinforcement learning paradigm to make decisions. DRL algorithms are based on the principle of trial-and-error and do not require any additional human supervision. This thesis studies three state-of-the-art DRL algorithms: Recurrent Experience Replay in Distributed Reinforcement Learning (R2D2), Never Give Up (NGU), and Agent57. It is often difficult to train these algorithms with limited computational and memory resources. We solve this challenge by identifying optimal hyperparameter values for each algorithm that improves the learning ability and reduces the training time. We train a single-process R2D2 and NGU algorithm on a game of Breakout and Maze and a multi-process Agent57 algorithm on an Atari Alien game. We examine the effects of specific hyperparameters such as burn-in length, trace length, replay memory size, number of episodes, and exploration rates on these algorithms and identify their optimal values for training. When we tested these three algorithms in a limited computational resource environment, we achieved a maximum score of 23 points and 10 points in the Atari Breakout and Maze respectively, and 140 points in the Atari Alien game. We confirm the successful training of these algorithms based on the positive game points achieved in the experiments. We further observe a significant drop of 80% in the training time between the single-process R2D2 agent and the multi-process Agent57 algorithm. This result confirms previous study results asserting that multi-process agents are faster to train than single-process agents. Based on the successful training of these three algorithms, we conclude that hyperparameter tuning is a significant factor that affects the learning ability of the DRL agent.
Recommended Citation
Banubakode, Apoorva Sunil, "Advancements in the Field of Reinforcement Learning" (2021). Master's Theses. 5172.
DOI: https://doi.org/10.31979/etd.6m6p-st5s
https://scholarworks.sjsu.edu/etd_theses/5172