Publication Date

Fall 2021

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Electrical Engineering

Advisor

Birsen Sirkeci

Subject Areas

Electrical engineering

Abstract

Machine learning used in the medical industry can potentially detect cancer in humancells at an early stage. However, training the machine learning models, especially deep learning models require thousands to millions of samples in order to reach an acceptable accuracy level. It is well-know that obtaining medical data is tedious hence in most cases, medical datasets have limited number of data samples. One solution for this problem is utilizing transfer learning such as pretrained networks on another dataset. Another solution is to increase the number of training data points with data augmentation. Common data augmentation methods for images include not only simple techniques such as transforming images using rotation and flipping, but also generative adversarial networks (GANs). However, one critical question is “Does the original dataset have enough to train a GAN?”. In most scenarios, the answer is “No” for this critical question. In this thesis, we propose a two-level data augmentation technique (simple data augmentation based on image transformations followed by a GAN) with transfer learning, which is tested on a small dataset of cancer cell images. The dataset used in this research consists of lung and colon cancer samples, each containing different types of cancers. Only part of the original dataset is used for experimenting in order to mimic small dataset environment. Our results show that the proposed method is able to achieve an accuracy of 94.1% even when 150 original images used for training. This is very close to 97.33% accuracy achieved if one uses all the available training data which is 12000 samples.

Share

COinS