Publication Date

Spring 2021

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Christopher J Pollett

Second Advisor

Katerina Potika

Third Advisor

Robert Chun

Keywords

RDF, SPARQL, Question Answering System, Wikidata, Tree-LSTM

Abstract

The Semantic Web is an extensive knowledge base that contains facts in the form of RDF
triples. These facts are not easily accessible to the average user because to use them requires
an understanding of ontologies and a query language like SPARQL. Question answering systems
form a layer of abstraction on linked data to overcome these issues. These systems allow the
user to input a question in a natural language and receive the equivalent SPARQL query. The
user can then execute the query on the database to fetch the desired results. The standard
techniques involved in translating natural language questions to SPARQL queries are natural
language processing, machine learning, and information retrieval.
In this report, we describe our English language to SPARQL query translation system. The
input for the proposed system reads a complete question in the English language, identifies the
type of query to be built, and finds the triples from the question to fit in the query. The system
contains two components – template classification which uses the Tree-LSTM technique to
identify the query template, and the entity recognition module which uses external libraries to
recognize the triples in the question. The Lc-QuAD database, with 200 questions across two
unique SPARQL templates, was used to train and evaluate the model. The system queries the
Wikidata database to answer the questions and gives 60% correct results.

Share

COinS