Publication Date

7-1-2025

Document Type

Article

Publication Title

Electronics Switzerland

Volume

14

Issue

14

DOI

10.3390/electronics14142787

Abstract

Automating candidate shortlisting is a non-trivial task that stands to benefit substantially from advances in artificial intelligence. We evaluate a suite of foundation models such as Llama 2, Llama 3, Mixtral, Gemma-2b, Gemma-7b, Phi-3 Small, Phi-3 Mini, Zephyr, and Mistral-7b for their ability to predict hiring outcomes in both zero-shot and few-shot settings. Using only features extracted from applicants’ submissions, these models, on average, achieved an AUC above 0.5 in zero-shot settings. Providing a few examples similar to the job applicants based on a nearest neighbor search improved the prediction rate marginally, indicating that the models perform competently even without task-specific fine-tuning. For Phi-3 Small and Mixtral, all reported performance metrics fell within the 95% confidence interval across evaluation strategies. Model outputs were interpreted quantitatively via post hoc explainability techniques and qualitatively through prompt engineering, revealing that decisions are largely attributable to knowledge acquired during pre-training. A task-specific MLP classifier trained solely on the provided dataset only outperformed the strongest foundation model (Zephyr in 5-shot setting) by approximately 3 percentage points on accuracy, but all the foundational models outperformed the baseline model by more than 15 percentage points on f1 and recall, underscoring the competitive strength of general-purpose language models in the hiring domain.

Keywords

explainable AI (XAI), few-shot learning, large language models (LLMs), recruitment, zero-shot learning

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Applied Data Science

Share

COinS