Reinforcement learning (RL) has been applied to robotics and many other domains which a system must learn in real-time and interact with a dynamic environment. In most studies the state-action space that is the key part of RL is predefined. Integration of RL with deep learning method has however taken a tremendous leap forward to solve novel challenging problems such as mastering a board game of Go. The surrounding environment to the agent may not be fully visible, the environment can change over time, and the feedbacks that agent receives for its actions can have a fluctuating delay. In this paper, we propose a Generic Online Learning (GOL) system for such environments. GOL is based on RL with a hierarchical structure to form abstract features in time and adapt to the optimal solutions. The proposed method has been applied to load balancing in 5G cloud random access networks. Simulation results show that GOL successfully achieves the system objectives of reducing cache-misses and communication load, while incurring only limited system overhead in terms of number of high-level patterns needed. We believe that the proposed GOL architecture is significant for future online learning of dynamic, partially visible environments, and would be very useful for many autonomous control systems.
Shahriari, Behrooz, "Generic Online Learning for Partial Visible & Dynamic Environment with Delayed Feedback" (2017). Master's Projects. 572.
Available for download on Friday, May 25, 2018