Robust Underactuated Point-Feet Bipedal Locomotion Using Deep Reinforcement Learning and a Balance Recovery System

Aref Amiri, Oulun Yliopisto
Soroush Zare, University of Virginia School of Engineering and Applied Science
Mojtaba Sharifi, San Jose State University

Abstract

This study proposes a deep reinforcement learning control strategy using the twin delayed deep deterministic algorithm for the robust locomotion of a point-feet, underactuated bipedal robot. We introduce two key contributions: a specialized balance recovery system and a bioinspired reward function. The balance recovery system is explicitly trained to handle off-balance and fall-like conditions. Its effectiveness was validated through 50 randomized trials, where it achieved a 74% success rate in stabilizing the robot from a wide range of initial heights, velocities, and configurations. The bioinspired reward function encourages the robot’s hip to remain between its feet, which was shown to significantly improve the gait stability. This reward shaping reduced the normalized fluctuation in joint angle movements by a factor of 1.75, even under external disturbances. The final controller produced an average running speed of 2.4 m/s and demonstrated robustness to external disturbances of up to ±60 N · m, paving the way for more resilient and adaptive bipedal locomotion.