Yichen Lin

Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)


Computer Science

First Advisor

Katerina Potika

Second Advisor

Cristina Tortora

Third Advisor

Faranak Abri


Clustering, Natural Language Processing, Transfer Learning, Large Language Models, Hotel Reviews, Opinion Graphs, Knowledge Graphs.


With the rapid development of the Internet, reading online reviews before making a purchase, booking a hotel, or making a restaurant reservation has become a part of daily life. Customers often consider reviews as crucial supplementary information before making decisions on how to spend their money. However, reading many reviews to gain helpful information takes time and effort. This project proposes a new method OpinionGraphGenerator that aims to create opinion graphs from hotel reviews to reduce the high volume of text in reviews while preserving essential insights. In an opinion graph, vertices are semantically similar opinions, where each opinion consists of an opinion term (a description of an aspect, e.g., tasty) and an aspect term (an entity, e.g., food). Edges represent the explanatory relationships between vertices; if there exists a directed edge pointing from vertex A to vertex B, that means the semantically similar opinions in vertex A explain the opinions in vertex B. We focus our attention on hotel reviews to create opinion graphs. The current method of constructing opinion graphs [1] requires a human-annotated hotel dataset, where the label is the existence of a directed edge. Unfortunately, the provided dataset in [1] does not identify the entity (hotel) corresponding to each review, making it impossible to recreate opinion graphs for each hotel. We explore how to create opinion graphs on a new hotel review dataset without labels. With transfer learning, we can reduce the need for labeled data while training a model, thus significantly reducing labor costs for labeling our dataset. Additionally, transfer learning with large language models is used for identifying sentences, extracting opinions, and mining explanatory relationships from reviews. Next, natural language processing methods are utilized to generate word embeddings for the reviews, further clustered to identify semantically similar opinions. Finally, with the extracted edges and semantically similar opinions as vertices, an opinion graph is generated.

Available for download on Sunday, May 25, 2025