Off-campus SJSU users: To download campus access theses, please use the following link to log into our proxy server with your SJSU library user name and PIN.

Publication Date

Fall 2024

Degree Type

Thesis - Campus Access Only

Degree Name

Master of Science (MS)

Department

Computer Engineering

Advisor

Bernardo Flores; Christian Schroeder de Witt; Xiao Su

Abstract

From autonomous drones and bipedal robots to aligning Large Language Model (LLM)s, Reinforcement Learning (RL) is ubiquitous in Artificial Intelligence (AI) today. A key problem in deploying RL agents to the real world is protecting such agents from action and observation space attacks, wherein attacks with low detectability are especially damaging. The adversarial observation space attacks studied in this thesis involve perturbing the input state of the victim's policy, such that the victim’s reward is minimized. Illusory observation space attacks are further subject to the additional constraint of low detectability, where a detectability budget is clearly defined using exact information-theoretic distributional constraints. Learning effective adversarial attacks is challenging because terms associated with each additional timestep are conditioned on the entire action-observation history. In this work, we extend the information-theoretic formulation for illusory attacks introduced in previous work, to two time steps. We conduct experiments on our chosen custom environment and show that with our extended implementation, we are able to learn an effective adversarial policy that significantly reduces the victim’s reward whilst being difficult to detect. We highlight the reasons why it works whilst also listing the current limitations and factors that affect the performance of our implementation. The thesis also includes a discussion of possible domain extensions and applications, ethical considerations of making this kind of potentially harmful AI open-source, as well as some future work directions that can address the current limitations and caveats.

Share

COinS