Publication Date

9-1-2025

Document Type

Article

Publication Title

Information Switzerland

Volume

16

Issue

9

DOI

10.3390/info16090758

Abstract

Language modeling has evolved from simple rule-based systems into complex assistants capable of tackling a multitude of tasks. State-of-the-art large language models (LLMs) are capable of scoring highly on proficiency benchmarks, and as a result have been deployed across industries to increase productivity and convenience. However, the prolific nature of such tools has provided threat actors with the ability to leverage them for attack development. Our paper describes the current state of LLMs, their availability, and their role in benevolent and malicious applications. In addition, we propose how an LLM can be combined with text-to-speech (TTS) voice cloning to create a framework capable of carrying out social engineering attacks. Our case study analyzes the realism of two different open-source TTS models, Tortoise TTS and Coqui XTTS-v2, by calculating similarity scores between generated and real audio samples from four participants. Our results demonstrate that Tortoise is able to generate realistic voice clone audios for native English speaking males, which indicates that easily accessible resources can be leveraged to create deceptive social engineering attacks. As such tools become more advanced, defenses such as awareness, detection, and red teaming may not be able to keep up with dangerously equipped adversaries.

Funding Number

2319803

Funding Sponsor

National Science Foundation

Keywords

AI misuse, cybersecurity, digital assistant, generative AI, large language models, voice cloning

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Computer Science; Economics

Share

COinS