Publication Date

Spring 2013

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Advisor

Tsau Y. Lin

Keywords

Alergia Algorithm, Automaton, Function-Words, Pattern, PTA, Style of Writing

Subject Areas

Computer science

Abstract

Most documents written by humans are not just a collection of words, sentences and paragraphs combined at random. It is believed that there is a pattern hidden behind those piles of characters that represents the author's style of writing. In the previous works and in this thesis, we assumed that the aforementioned belief was a true statement and tried to discover and represent the pattern by automata machines. We used the Alergia algorithm to form an automaton from a prefix-tree-accepter. By testing, we verified that the Alergia algorithm was correctly implemented in our software. Our tests showed that we captured only the patterns of the collections of single sentences in a book. Unfortunately, that is not the full content of a book. Therefore, establishing variable chopping units or a less forceful chopping approach would be a promising approach.

Share

COinS