Publication Date

Spring 2023

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Katerina Potika

Second Advisor

Robert Chun

Third Advisor

Chris Pollett


Knowledge Graph Construction, SpaCy, TEI, Wikidata


Textbooks are written and organized in a way that facilitates learning and understanding. Sections like glossary terms at the end of a textbook provide guidance on the topic of interest. However, it takes manual effort to create the index terms in the glossary that highlight the key referenced terminologies and related terms. Knowledge graphs, which have been used to represent and even reason over data and knowledge, can potentially capture textbook’s important terms, concepts, and their relations. Popular since the initial introduction by Google Knowledge Graphs (KGs), they combine graph and data to capture and model enormous amounts of relational facts in fields ranging from social media to sciences. Recently, techniques have been developed to extract knowledge bases from textbooks. After we have the knowledge graph of a textbook we can perform completion tasks of predicting missing entities or relations by representing knowledge graphs in low-dimensional spaces.

The main objective of the project is to apply knowledge graph construction tech- niques on textbooks. The main challenge is the absence of the domain specific schema of each textbook. We use different entity and relation extraction models to capture logical and semantic information related to the textbook topic. A Text-Encoding- Initiative model was employed to extract hierarchical concepts from a textbook; spaCy NLP ,and Google Cloud NLP were able to extract semantic information from the main textual content of a textbook. A case study on a cloud computing textbook was conducted and evaluated with each of the approaches.