Publication Date

Spring 2017

Degree Type


Degree Name

Master of Library and Information Science (MLIS)




Geoffrey Z. Liu


classification, machine learning, weeding

Subject Areas

Library science; Artificial intelligence


Studies have shown that library weeding (the selective removal of unused, worn, outdated, or irrelevant items) benefits patrons and increases circulation rates. However, the time required to review the collection and make weeding decisions presents a formidable obstacle. In this study, we empirically evaluated methods for automatically classifying weeding candidates. A data set containing 80,346 items from a large-scale academic library weeding project by Wesleyan University from 2011 to 2014 was used to train six machine learning classifiers to predict “Keep” or “Weed” for each candidate. We found statistically significant agreement (p = 0.001) between classifier predictions and librarian judgments for all classifier types. The naive Bayes and linear support vector machine classifiers had the highest recall (fraction of items weeded by librarians that were identified by the algorithm), while the k-nearest-neighbor classifier had the highest precision (fraction of recommended candidates that librarians had chosen to weed). The most relevant variables were found to be librarian and faculty votes for retention, item age, and the presence of copies in other libraries. Future weeding projects could use the same approach to train a model to quickly identify the candidates most likely to be withdrawn.