Publication Date
Spring 2016
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Thanh Duc Tran
Second Advisor
Robert Chun
Third Advisor
Sharath Chandra Pilli
Keywords
Entity Matching Semantic Web
Abstract
Entity Matching (EM) is the problem of determining if two entities in a data set refer to the same real-world object. For example, it decides if two given mentions in the data, such as “Helen Hunt” and “H. M. Hunt”, refer to the same real-world entity by using different similarity functions. This problem plays a key role in information integration, natural language understanding, information processing on the World-Wide Web, and on the emerging Semantic Web. This project deals with the similarity functions and thresholds utilized in them to determine the similarity of the entities. The work contains two major parts: implementation of a hybrid similarity function, which contains three different similarity functions to determine the similarity of entities, and an efficient method to determine the optimum threshold value for similarity functions to get accurate results.
Recommended Citation
Gorijala, Vimal Chandra, "Hybrid Similarity Function for Big Data Entity Matching with R-Swoosh" (2016). Master's Projects. 484.
DOI: https://doi.org/10.31979/etd.nck7-c4y7
https://scholarworks.sjsu.edu/etd_projects/484