Off-campus SJSU users: To download campus access theses, please use the following link to log into our proxy server with your SJSU library user name and PIN.

Publication Date

Spring 2021

Degree Type

Thesis - Campus Access Only

Degree Name

Master of Science (MS)


Computer Engineering


Simon Shim


Agent57, Never Give Up, Recurrent Experience Replay in Distributed Reinforcement Learning, Reinforcement Learning

Subject Areas

Artificial intelligence


Deep reinforcement learning (DRL) systems have transformed artificial intelligenceby solving complex decision-making problems. One such example is learning to play video games using visual sensory information. These DRL systems use deep learning methodology to process sensory information and a reinforcement learning paradigm to make decisions. DRL algorithms are based on the principle of trial-and-error and do not require any additional human supervision. This thesis studies three state-of-the-art DRL algorithms: Recurrent Experience Replay in Distributed Reinforcement Learning (R2D2), Never Give Up (NGU), and Agent57. It is often difficult to train these algorithms with limited computational and memory resources. We solve this challenge by identifying optimal hyperparameter values for each algorithm that improves the learning ability and reduces the training time. We train a single-process R2D2 and NGU algorithm on a game of Breakout and Maze and a multi-process Agent57 algorithm on an Atari Alien game. We examine the effects of specific hyperparameters such as burn-in length, trace length, replay memory size, number of episodes, and exploration rates on these algorithms and identify their optimal values for training. When we tested these three algorithms in a limited computational resource environment, we achieved a maximum score of 23 points and 10 points in the Atari Breakout and Maze respectively, and 140 points in the Atari Alien game. We confirm the successful training of these algorithms based on the positive game points achieved in the experiments. We further observe a significant drop of 80% in the training time between the single-process R2D2 agent and the multi-process Agent57 algorithm. This result confirms previous study results asserting that multi-process agents are faster to train than single-process agents. Based on the successful training of these three algorithms, we conclude that hyperparameter tuning is a significant factor that affects the learning ability of the DRL agent.