Publication Date

Fall 2025

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering

Advisor

Jun Liu; Bernardo Flores; Mahima Agumbe Suresh

Abstract

Learning from demonstrations offers a path to bypass the sample inefficiency of reinforcement learning, but obtaining action-labeled expert demonstrations remains expensive and often impractical. Learning from Observations (LFO) addresses this by learning policies from observation-only demonstrations. Recent LFO work relies heavily on behavior cloning: VPT and LAPO use observation-only data combined with limited action labels to train BC policies, while AIME offers an alternative policy inference approach but requires the majority of its training data to have action labels. Through systematic experiments in the Lunar Lander environment, we investigate whether latent action methods can function when state and action dimensionalities are comparable, and whether diverse non-expert data produces better latent representations than expert-only data. Our analysis reveals that error distribution patterns matter more than overall decoder accuracy for sequential decision-making, and that diverse trajectories create latent spaces enabling better policy performance despite lower decoder accuracy. These findings provide practical guidance for designing latent action systems. Building on these insights, our primary contribution is adapting AIME to operate in latent action space by using LAPO’s Forward Dynamics Model—trained on unlabeled, non-expert trajectories—as the world model in AIME’s framework. This significantly expands AIME’s applicability as an LFO technique: rather than requiring action labels for most training data, our approach requires them only for a small decoder training set. This also introduces an alternative to behavior cloning for LFO methods, enabling trajectory-level policy optimization in observation-only settings. We demonstrate that policies trained via latent-space AIME achieve performance comparable to BC trained with action labels when expert demonstrations are limited.

Available for download on Saturday, August 15, 2026

Share

COinS