Faculty Publications, Computer Science

Word Sense Disambiguation in Biomedical Ontologies With Term Co-occurrence Analysis and Document Clustering

Bill Andreopoulos, Technischen Universitat Dresden
Dimitra Alexopoulou, Technische Universität Dresden
Michael Schroeder, Technischen Universitat Dresden

Document Type

Article

Publication Date

September 2008

Publication Title

International Journal of Data Mining and Bioinformatics

Volume

Issue Number

First Page

193

Last Page

215

Abstract

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as |development| can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an F-measure of 77%. Additionally, applying document clustering improves precision to 82%. We applied the same approach to disambiguate |nucleus|, |transport|, and |spindle|, and we achieved consistent results. Thus, our method is a viable approach towards the automation of literature-based genome annotation.

Comments

This article originally appeared in: Andreopoulos, B., Alexopoulou, D., and Schroeder, M. (2008). Word Sense Disambiguation in Biomedical Ontologies With Term Co-occurrence Analysis and Document Clustering. International Journal of Data Mining and Bioinformatics, 2(3), 193 - 215. Copyright © 2008 Inderscience Enterprises Ltd. The article can also be found online at this link.

Recommended Citation

Bill Andreopoulos, Dimitra Alexopoulou, and Michael Schroeder. "Word Sense Disambiguation in Biomedical Ontologies With Term Co-occurrence Analysis and Document Clustering" International Journal of Data Mining and Bioinformatics (2008): 193-215. https://doi.org/10.1504/IJDMB.2008.020522

Download

Find in your library

Included in

Bioinformatics Commons, Computer Sciences Commons

COinS

Faculty Publications, Computer Science

Word Sense Disambiguation in Biomedical Ontologies With Term Co-occurrence Analysis and Document Clustering

Document Type

Publication Date

Publication Title

Volume

Issue Number

First Page

Last Page

Abstract

Comments

Recommended Citation

Included in

Search

Browse All

Links

SelectedWorks Sites

Faculty Publications, Computer Science

Word Sense Disambiguation in Biomedical Ontologies With Term Co-occurrence Analysis and Document Clustering

Authors

Document Type

Publication Date

Publication Title

Volume

Issue Number

First Page

Last Page

Abstract

Comments

Recommended Citation

Included in

Share

Search

Browse All

Links

SelectedWorks Sites