Publication Date
Fall 2015
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
Abstract
Identification of the correct sense for an ambiguous word is one of the major challenges for language processing in all domains. Word Sense Disambiguation is the task of identifying the correct sense of an ambiguous word by referencing the surrounding context of the word. Similar to the narrative documents, clinical documents suffer from ambiguity issues that impact automatic extraction of correct sense from the document. In this project, we propose a graph-based solution based on an algorithm originally implemented by Osmar R. Zaine et al. for word sense disambiguation specifically focusing on clinical text. The algorithm makes use of proposed UMLS Metathesaurus as its source of knowledge. As an enhancement to the existing implementation of the algorithm, this project uses Apache Spark - A Big Data Technology for cluster based distributed processing and performance optimization.
Recommended Citation
Padavkar, Veebha, "GRAPH BASESD WORD SENSE DISAMBIGUATION FOR CLINICAL ABBREVIATIONS USING APACHE SPARK" (2015). Master's Projects. 455.
DOI: https://doi.org/10.31979/etd.tx8m-kzg9
https://scholarworks.sjsu.edu/etd_projects/455
Comments
Identification of the correct sense for an ambiguous word is one of the major challenges for language processing in all domains. Word Sense Disambiguation is the task of identifying the correct sense of an ambiguous word by referencing the surrounding context of the word. Similar to the narrative documents, clinical documents suffer from ambiguity issues that impact automatic extraction of correct sense from the document. In this project, we propose a graph-based solution based on an algorithm originally implemented by Osmar R. Zaine et al. for word sense disambiguation specifically focusing on clinical text. The algorithm makes use of proposed UMLS Metathesaurus as its source of knowledge. As an enhancement to the existing implementation of the algorithm, this project uses Apache Spark - A Big Data Technology for cluster based distributed processing and performance optimization.