Publication Date

Spring 2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Chris Pollett

Second Advisor

Robert Chun

Third Advisor

Mike Wu

Keywords

Natural Language Processing, Chinese, Chinese Words Segmentation, Part-of-Speech Tagging, Named Entity Recognition, Question Answering System

Abstract

Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation.

Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote and improved the following sub-systems of Yioop to try to make them as state-of-the-art as possible: Chinese text segmentation, Part-of-speech (POS) tagging, Named Entity Recognition (NER), and Question and Answering System.

Compared to the previous system we had a 9% improvement on Chinese words Segmentation accuracy. We built POS tagging with 89% accuracy. And We implement NER System with 76% accuracy.

Recommended Citation

Sun, Xianghong, "Improved Chinese Language Processing for an Open Source Search Engine" (2020). Master's Projects. 925.
DOI: https://doi.org/10.31979/etd.fsj8-fwt5
https://scholarworks.sjsu.edu/etd_projects/925

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

DOI

https://doi.org/10.31979/etd.fsj8-fwt5

Master's Projects

Improved Chinese Language Processing for an Open Source Search Engine

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Improved Chinese Language Processing for an Open Source Search Engine

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links