Publication Date

Spring 6-23-2017

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Thanh D. Tran

Second Advisor

Robert Chun

Third Advisor

James Casaletto


Recently, numerous specialists are concentrating on the utilization of Natural Language Processing (NLP) systems in various domains, for example, data extraction and content mining. One of the difficulties with these innovations is building up a precise Question and Answering (QA) System. Question type recognition is the most significant task in a QA system, for example, chat bots. Organization such as National Institute of Standards (NIST) hosts a conference series called as Text REtrieval Conference (TREC) series which keeps a competition every year to encourage and improve the technique of information retrieval from a large corpus of text. When a user asks a question, he/she expects a correct form of answer in reply. The undertaking of classifying a question type is to anticipate the sort of a question which is composed in common dialect. The question is then classified to one of the predefined question types. The objective of this project is to build a question type recognition system using big data and machine learning techniques. The system will comprise of a supervised learning model that will receive a question in a natural language input and it can recognize and classify a given question based upon its question type. Extracting important textual features and building a model using those features is the most important task of this project. The training and testing data has been obtained from the TREC website. Training data comprises of a corpus of unique questions and the labels associated with it. The model is tested and evaluated using the testing data. This project also achieves the goal of making a scalable system using big data technologies.