Detecting Phishing URLs using the BERT Transformer Model

Publication Date

1-1-2023

Document Type

Conference Proceeding

Publication Title

Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023

DOI

10.1109/BigData59044.2023.10386782

First Page

2483

Last Page

2492

Abstract

Phishing websites many a times look-alike to benign websites with the objective being to lure unsuspecting users to visit them. The visits at times may be driven through links in phishing emails, links from web pages as well as web search results. Although the precise motivations behind phishing websites may differ the common denominator lies in the fact that unsuspecting users are mostly required to take some action e.g., clicking on a desired Uniform Resource Locator (URL). To accurately identify phishing websites, the cybersecurity community has relied on a variety of approaches including blacklisting, heuristic techniques as well as content-based approaches among others. The identification techniques are every so often enhanced using an array of methods i.e., honeypots, features recognitions, manual reporting, web-crawlers among others. Nevertheless, a number of phishing websites still escape detection either because they are not blacklisted, are too recent or were incorrectly evaluated. It is therefore imperative to enhance solutions that could mitigate phishing websites threats. In this study, the effectiveness of the Bidirectional Encoder Representations from Transformers (BERT) is investigated as a possible tool for detecting phishing URLs. The experimental results detail that the BERT transformer model achieves acceptable prediction results without requiring advanced URLs feature selection techniques or the involvement of a domain specialist.

Funding Number

2319802

Funding Sponsor

National Science Foundation

Keywords

BERT Transformer Language Model., Phishing URLs, Social Engineering

Department

Computer Science

Share

COinS