Publication Date

Fall 2015

Degree Type

Master's Project


Computer Science


The volume of structured and unstructured data has grown at exponential scale in recent days. As a result of this rapid data growth, we are always inundated with plethora of choices in any product or service. It is very natural to get lost in the amazon of such choices and finding hard to make decisions. The project aims at addressing this problem by using entity recommendation. The two main aspects that the project concentrates on are implementing and presenting more accurate entity recommendations to the user and another is dealing with vast amount of data. The project aims at presenting recommendation results according to user’s query with efficiency and accuracy. Project makes use of ListNet ranking algorithm to rank the recommendation results. Query independent features and query dependent features are used to come up with ranking scores. Ranking scores decide the order in which the recommendation results are presented to the user. Project makes use of Apache Spark, a distributed bigdata processing framework. Spark gives the advantage of handling iterative and interactive algorithms with efficiency and minimal processing time as compared to traditional mapreduce paradigm. We performed the experiments for recommendation engine using DBPedia as the dataset and tested the results for movie domain. We used both queryindependent (pagerank) and querydependent (clicklogs) features for ranking purposes. We observed that ListNet algorithm performs really well by making use of Apache Spark as the RDDs provide faster way for iterative algorithms to execute. We also observed that the results of recommendation engine are accurate and the entities are well ranked.