Publication Date

Spring 2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Chris Pollett

Second Advisor

Robert Chun

Third Advisor

Mike Wu


Natural Language Processing, Chinese, Chinese Words Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Question Answering System


Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation.

Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote and improved the following sub-systems of Yioop to try to make them as state-of-the-art as possible: Chinese text segmentation, Part-of-speech (POS) tagging, Named Entity Recognition (NER), and Question and Answering System.

Compared to the previous system we had a 9% improvement on Chinese words Segmentation accuracy. We built POS tagging with 89% accuracy. And We implement NER System with 76% accuracy.