Accelerated Reward Policy (ARP) for Robotics Deep Reinforcement Learning
Publication Date
1-1-2022
Document Type
Conference Proceeding
Publication Title
Lecture Notes in Networks and Systems
Volume
439 LNNS
DOI
10.1007/978-3-030-98015-3_15
First Page
222
Last Page
234
Abstract
Reward policy is a crucial part for Deep Reinforcement Learning (DRL) applications in Robotics. The challenges for autonomous systems with “human-like” behavior have posed significant need for a better, faster, and more robust training based on optimized reward function. Inspired by the Berkeley and Google’s work, this paper addresses our recent development in reward policy/function design. In particular, we have formulated an accelerated reward policy (ARP) based on a non-linear functions. We have applied this reward function to SAC (Soft Actor Critic) algorithm for 6 DoF (Degree of Freedom) robot training in simulated environment using Unity Gaming platform and a 6 DoF robot. This nonlinear ARP function gives bigger reward to accelerate the robot’s positive behavior during the training. Comparing to the existing algorithm our experimental results demonstrated faster convergence and bigger, better accumulative reward. With limited experimental data, the results show improved accumulative reward function as much as 2 times of the previous results.
Funding Number
034-1312-1082
Funding Sponsor
San José State University
Keywords
6 DoF robot, Autonomous systems, Deep Reinforcement Learning, Machine learning, Unity
Department
Computer Engineering
Recommended Citation
Harry Li, Chee Vang, Shifabanu Shaikh, Yusuke Yakuwa, Allen Lee, Nitin Patil, Nisarg Vadher, Zhixuan Zhou, and Shuwen Zheng. "Accelerated Reward Policy (ARP) for Robotics Deep Reinforcement Learning" Lecture Notes in Networks and Systems (2022): 222-234. https://doi.org/10.1007/978-3-030-98015-3_15