Publication Date

Spring 5-26-2016

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Thanh Duc Tran

Second Advisor

Robert Chun

Third Advisor

Sharath Chandra Pilli

Abstract

Entity Matching (EM) is the problem of determining if two entities in a data set refer to the same real-world object. For example, it decides if two given mentions in the data, such as “Helen Hunt” and “H. M. Hunt”, refer to the same real-world entity by using different similarity functions. This problem plays a key role in information integration, natural language understanding, information processing on the World-Wide Web, and on the emerging Semantic Web. This project deals with the similarity functions and thresholds utilized in them to determine the similarity of the entities. The work contains two major parts: implementation of a hybrid similarity function, which contains three different similarity functions to determine the similarity of entities, and an efficient method to determine the optimum threshold value for similarity functions to get accurate results.

Share

COinS