Publication Date

Spring 2012

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


Most enterprise search engines employ data mining classifiers to classify documents. Along with the economic globalization, many companies are starting to have overseas branches or divisions. Those branches are using local languages in documents and emails. When a classifier tries to categorize those documents in another language, the trained model in mono-lingual will not work. The most direct solution would be to translate those documents in other languages into one language by the machine translator. But this solution suffers from inaccuracy of the machine translation, and the over-head work is economically inefficient. Another approach is to translate the feature extracted from one language to another language and use them to classify another language. This approach is efficient but faces a translation inaccuracy and language culture gap. In this project, the author proposes a new method which adapts both the model translation and document translation. This method can take advantage of the very best functionality between both the document translation and model translation methods.