Publication Date
Fall 2020
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Teng Moh
Second Advisor
Katerina Potika
Third Advisor
Mike Wu
Keywords
multi-agent deep reinforcemnt learning, MADRL
Abstract
This project was motivated by seeking an AI method towards Artificial General Intelligence (AGI), that is, more similar to learning behavior of human-beings. As of today, Deep Reinforcement Learning (DRL) is the most closer to the AGI compared to other machine learning methods. To better understand the DRL, we compares and contrasts to other related methods: Deep Learning, Dynamic Programming and Game Theory.
We apply one of state-of-art DRL algorithms, called Proximal Policy Op- timization (PPO) to the robot walkers locomotion, as a simple yet challenging environment, inherently continuous and high-dimensional state/action space.
The end goal of this project is to train the agent by finding the optimal sequential actions (policy/strategy) of multi-walkers leading them to move forward as far as possible to maximize the accumulated reward (performance). This goal can be accomplished by finding the tuned hyperparameters of the PPO algorithm by monitoring the performances for the multi-agent DRL (MADRL) settings.
At the end, we can draw three conclusions from our findings based on the various MADRL experiments: 1) Unlike DL with explicit target labels, DRL needs larger minibatch size for better estimate of values from various gradients. There- fore, a minibatch size and its pool size (experience replay buffer) are critical hyperparameters in PPO algorithm. 2) For the homogeneous multi-agent envi- ronments, there is a mutual transferability between single-agent and multi-agent environments to be able to reuse the tuned hyperparameters. 3) For the homo- geneous multi-agent environments with a well tuned hyperparameter set, the
parameter sharing is a better strategy for the MADRL in terms of performance and efficiency with reduced parameters and less memory.
To conclude, reward-driven, sequential and evaluative learning, the DRL, would be closer to AGI if multiple DRL agents learn to collaborate to capture the true signal from the shared environment. This work provides one instance of implicit cooperative learning of MADRL.
Recommended Citation
Park, Inhee, "Multi-Agent Deep Reinforcement Learning for Walkers" (2020). Master's Projects. 972.
DOI: https://doi.org/10.31979/etd.tpey-94k6
https://scholarworks.sjsu.edu/etd_projects/972