Publication Date

Fall 2025

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering

Advisor

Jun Liu; Bernardo Flores; Mahima Agumbe Suresh

Abstract

Learning from demonstrations offers a path to bypass the sample inefficiency of reinforcement learning, but obtaining action-labeled expert demonstrations remains expensive and often impractical. Learning from Observations (LFO) addresses this by learning policies from observation-only demonstrations. Recent LFO work relies heavily on behavior cloning: VPT and LAPO use observation-only data combined with limited action labels to train BC policies, while AIME offers an alternative policy inference approach but requires the majority of its training data to have action labels. Through systematic experiments in the Lunar Lander environment, we investigate whether latent action methods can function when state and action dimensionalities are comparable, and whether diverse non-expert data produces better latent representations than expert-only data. Our analysis reveals that error distribution patterns matter more than overall decoder accuracy for sequential decision-making, and that diverse trajectories create latent spaces enabling better policy performance despite lower decoder accuracy. These findings provide practical guidance for designing latent action systems. Building on these insights, our primary contribution is adapting AIME to operate in latent action space by using LAPO’s Forward Dynamics Model—trained on unlabeled, non-expert trajectories—as the world model in AIME’s framework. This significantly expands AIME’s applicability as an LFO technique: rather than requiring action labels for most training data, our approach requires them only for a small decoder training set. This also introduces an alternative to behavior cloning for LFO methods, enabling trajectory-level policy optimization in observation-only settings. We demonstrate that policies trained via latent-space AIME achieve performance comparable to BC trained with action labels when expert demonstrations are limited.

Recommended Citation

Kandekar, Rahul Milind, "Latent Action Trajectory Optimization" (2025). Master's Theses. 5743.
DOI: https://doi.org/10.31979/etd.xges-229r
https://scholarworks.sjsu.edu/etd_theses/5743

Download

Available for download on Saturday, August 15, 2026

Included in

Computer Engineering Commons

COinS

DOI

https://doi.org/10.31979/etd.xges-229r

Master's Theses

Latent Action Trajectory Optimization

Publication Date

Degree Type

Degree Name

Department

Advisor

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Theses

Latent Action Trajectory Optimization

Author

Publication Date

Degree Type

Degree Name

Department

Advisor

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links