Publication Date

Fall 2014

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Robert Chun

Second Advisor

Chris Pollett

Third Advisor

Nikolay Varbanets


hierarchal k-means clustering stock prediction artificial intelligence


We have gathered over 3100 annual financial reports for 500 companies listed on the S&P 500 index, where the main goal was to select and give proper weights to the various pieces of quantitative data to maximize clustering results and improve prediction results over previous work by [Lin et al. 2011]. Various financial ratios, including earnings per share surprise percentages were gathered and analyzed. We proposed and used two types, correlation based ratios and causality based ratios. An extension to the classification scheme used by [Lin et al. 2011] was proposed to more accurately classify financial reports, together with a more outlier- tolerant normalization technique. We proved that our proposed data scaling/normalization method is superior to the method used by [Lin et al. 2011]. We heavily focused on the relative importance of various financial ratios. We proposed a new method for determining the relative importance of the various financial ratios, and showed that the resulting weights aligned with theoretical expectations. Using this new weighing scheme, we were able to achieve superior cluster purities as compared to the method proposed by [Lin et al. 2011]. Achieving higher cluster purity in initial stages of analysis lead to minimized over-fitting by a modified version of K-Means, and overall better prediction accuracy on average.