Publication Date
Fall 2021
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Robert Chun
Second Advisor
Christopher Pollett
Third Advisor
Stuti Patel
Keywords
Algorithm, AUC, Classification-based, Churn, Confusion matrix, Machine learning Models, Logistic Regression, Precision, Recall, ROC curve, Sensitivity, Specificity, Support Vector Machine, Supervised
Abstract
It is a challenge for Human Resource (HR) team to retain their existing employees than to hire a new one. For any company, losing their valuable employees is a loss in terms of time, money, productivity, and trust, etc. This loss could be possibly minimized if HR could beforehand find out their potential employees who are planning to quit their job hence, we investigated solving the employee churn problem through the machine learning perspective. We have designed machine learning models using supervised and classification-based algorithms like Logistic Regression and Support Vector Machine (SVM). The models are trained with the IBM HR employee dataset retrieved from https://kaggle.com and later fine-tuned to boost the performance of the models. Metrics such as precision, recall, confusion matrix, AUC, ROC curve were used to compare the performance of the models. The Logistic Regression model recorded an accuracy of 0.67, Sensitivity of 0.65, Specificity of 0.70, Type I Error of 0.30, Type II Error of 0.35, and AUC score of 0.73 where as SVM achieved an accuracy of 0.93 with Sensitivity of 0.98, Specificity of 0.88, Type I Error of 0.12, Type II Error of 0.01 and AUC score of 0.96.
Recommended Citation
Maharjan, Rajendra, "Employee Churn Prediction using Logistic Regression and Support Vector Machine" (2021). Master's Projects. 1043.
DOI: https://doi.org/10.31979/etd.3t5h-excq
https://scholarworks.sjsu.edu/etd_projects/1043