Publication Date

Spring 5-24-2021

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Chrisopher J Pollett

Second Advisor

Robert Chun

Third Advisor

Sunhera Paul

Keywords

Affordance Prediction, Heat Map, ConvLSTM

Abstract

The rapid growth of the development of autonomous robots is transforming the manufacturing and healthcare industry in many ways, but they still face many challenges. One of the challenges experienced by autonomous robots is their inability to manipulate an unknown object without human supervision. One way through which autonomous robots can manipulate an unknown object is affordance learning [1]. Affordance describes the action a user can perform on the object in given surroundings. This report describes our proposed model to detect and predict the affordance of an object from videos by leveraging the spatial-temporal feature extraction through ConvLSTM and Fully Convolutional Networks. Our model is built upon an Encoder-Decoder architecture. The encoder consists of CNN to capture spatial features of the input frames and ConvLSTM to capture the temporal dynamics of the input frames. The decoder utilizes the encoder's output to classify the affordance of a given task and predict the interaction region between the human and the object in the form of a heatmap. The decoder is composed of a LSTM, utilized to classify affordance of a given task, and a Fully Convolutional Neural Network to predict the heatmap of the interaction region.

Share

COinS