Publication Date
Fall 2025
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Engineering
Advisor
Jun Liu; Bernardo Flores; Mahima Agumbe Suresh
Abstract
Learning from demonstrations offers a path to bypass the sample inefficiency of reinforcement learning, but obtaining action-labeled expert demonstrations remains expensive and often impractical. Learning from Observations (LFO) addresses this by learning policies from observation-only demonstrations. Recent LFO work relies heavily on behavior cloning: VPT and LAPO use observation-only data combined with limited action labels to train BC policies, while AIME offers an alternative policy inference approach but requires the majority of its training data to have action labels. Through systematic experiments in the Lunar Lander environment, we investigate whether latent action methods can function when state and action dimensionalities are comparable, and whether diverse non-expert data produces better latent representations than expert-only data. Our analysis reveals that error distribution patterns matter more than overall decoder accuracy for sequential decision-making, and that diverse trajectories create latent spaces enabling better policy performance despite lower decoder accuracy. These findings provide practical guidance for designing latent action systems. Building on these insights, our primary contribution is adapting AIME to operate in latent action space by using LAPO’s Forward Dynamics Model—trained on unlabeled, non-expert trajectories—as the world model in AIME’s framework. This significantly expands AIME’s applicability as an LFO technique: rather than requiring action labels for most training data, our approach requires them only for a small decoder training set. This also introduces an alternative to behavior cloning for LFO methods, enabling trajectory-level policy optimization in observation-only settings. We demonstrate that policies trained via latent-space AIME achieve performance comparable to BC trained with action labels when expert demonstrations are limited.
Recommended Citation
Kandekar, Rahul Milind, "Latent Action Trajectory Optimization" (2025). Master's Theses. 5743.
DOI: https://doi.org/10.31979/etd.xges-229r
https://scholarworks.sjsu.edu/etd_theses/5743