Publication Date

Fall 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Wendy Lee

Second Advisor

Natalia Khuri

Third Advisor

Mike Wu


Text Extracttion, Drug data, NLP, Machine Learning


Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, Machine Learning algorithms have been employed to build baseline binary classification models to identify pediatric text in unstructured drug labels. Third, a series of experiments have been executed to evaluate the accuracy of the model. The prototype is able to classify pediatrics-related text with a recall of 0.93 and precision of 0.86.