Publication Date

Spring 2013

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


Ever since Twitter has been widely accepted and has become an immensely popular micro blogging website, it is being used as a primary source of news; be it related to sports, entertainment, politics or technology by several users. It has been proven earlier that the elimination of stop words has a positive impact on the clustering of technology related tweets. The focus of this paper is to enhance the quality of clustering of the technology related Tweets by developing a semi-automated approach to eliminating stop words and by making use of a combination of Canopy and K-means clustering algorithms. The paper also details an algorithmic approach to determine the threshold values for Canopy clustering.