Publication Date
Spring 2020
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Chris Pollett
Second Advisor
Robert Chun
Third Advisor
Mike Wu
Keywords
Natural Language Processing, Chinese, Chinese Words Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Question Answering System
Abstract
Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation.
Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote and improved the following sub-systems of Yioop to try to make them as state-of-the-art as possible: Chinese text segmentation, Part-of-speech (POS) tagging, Named Entity Recognition (NER), and Question and Answering System.
Compared to the previous system we had a 9% improvement on Chinese words Segmentation accuracy. We built POS tagging with 89% accuracy. And We implement NER System with 76% accuracy.
Recommended Citation
Sun, Xianghong, "Improved Chinese Language Processing for an Open Source Search Engine" (2020). Master's Projects. 925.
DOI: https://doi.org/10.31979/etd.fsj8-fwt5
https://scholarworks.sjsu.edu/etd_projects/925