Automated Medical Diagnosis from Clinical Data

Publication Date


Document Type

Conference Proceeding

Publication Title

2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService)



First Page


Last Page



A significant portion of the world population does not have access to proper healthcare. The key factor for healthcare's success is the physician's expertise. In this paper, we examine if that expertise can be modeled as an information corpus, a flavor of Big Data and extracted using text mining techniques, particularly using the Vector Space Model, to perform diagnosis. Using cloud and mobile technologies, medical diagnosis can then be made available everywhere there is Internet connectivity, reducing costs, increasing coverage and improving quality of life. The key to the possibility of performing medical diagnosis using an information retrieval approach is the data. This paper therefore focuses on the suitability of the dataset for automating diagnosis using text mining. We use various text mining tools relevant to the Vector Space Model to perform operations on the data to see if meaningful conclusions can be drawn from it. We present some of our observations from the experiments conducted and conclude with future directions.


Medical Diagnosis, Information Retrieval, Machine Learning, Text Mining, Vector Space Model, TF-IDF, Cluster Analysis, K-means


SJSU users: Use the following link to login and access the article via SJSU databases.


Applied Data Science