Publication Date

Fall 2011

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


The main purpose of this project is to section similar news and articles from a vast variety of news articles. Let’s say, you want to read about latest news related to particular topic like sports. Usually, user goes to a particular website and goes through some news but he won’t be able to cover all the news coverage in a single website. So, he would be going through some other news website to checking it out and this continues. Also, some news websites might be containing some old news and the user might be going through that. To solve this, I have developed a web application where in user can see all the latest news from different websites in a single place. Users are given choice to select the news websites from which they want to view the latest news. The articles which we get from news websites are very random and we will be applying the DBSCAN algorithm and place the news articles in the cluster form for each specific topic for user to view. If the user wants to see sports, he will be provided with sports news section. And this process of extracting random news articles and forming news clusters are done at run time and at all times we will get the latest news as we will be extracting the data from web at run time. This is an effective way to watch all news at single place. And in turn this can be used as articles (URL) recommender as the user has to just go through the specific cluster which interests him and not visit all news websites to find articles. This way the user does not have to visit different sites to view all latest news. This idea can be expanded to not just news articles but also in other areas like collecting statistics of financial information etc. As the processing is done at runtime, the performance has to be improved. To improve the performance, the distributed data mining is used and multiple servers are being used which communicate with each other.