Multi-Agent Deep Reinforcement Learning for Walker Systems

Publication Date

1-1-2021

Document Type

Conference Proceeding

Publication Title

Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021

DOI

10.1109/ICMLA52953.2021.00082

First Page

490

Last Page

495

Abstract

We applied the state-of-art performance Deep Reinforcement Learning (DRL) algorithm, Proximal Policy optimization (PPO), to the minimal robot-legs locomotion for the challenging multi-agent, continuous and high-dimensional state-space environments. The main contribution of this work is identifying the potential factors/hyperparameters and their effects on the performance of the multi-agent settings by varying the number of agents. Based on the comprehensive experiments with 2-10 multi-walkers environments, we found that 1) A minibatch size and a sampling reuse ratio (experience replay buffer size containing multiple minibatches) are critical hyperparameters to improve performance of the PPO; 2) Optimal neural network size depends on the number of walkers in the multi-agent environments; and 3) Parameter sharing among multi-agent is a better training strategy than fully independent learning in terms of comparable performance and improved efficiency with reduced parameters consuming less memory.

Keywords

Deep Reinforcement Learning (DRL), Multi-agent DRL (MADRL), Proximal Policy optimization (PPO)

Department

Computer Science

Share

COinS