Faculty Research, Scholarly, and Creative Activity

A Comparative Study of Linear and Non-Linear Dimensionality Reduction for Opcode-Frequency Malware Classification

Publication Date

1-28-2026

Document Type

Article

Publication Title

Journal of Computer Virology and Hacking Techniques

Volume

Issue

DOI

10.1007/s11416-026-00597-1

Abstract

High-dimensional feature spaces in malware classification pose significant challenges for machine learning performance. To address these challenges, this paper presents a comparative evaluation of four dimensionality-reduction techniques–Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Uniform Manifold Approximation and Projection (UMAP), and Autoencoder-based reduction–applied to opcode-frequency representations of malware. Using a corpus comprising 82,569 samples and 1796 opcodes, we analyze the effect of each reduction method across multiple target dimensions and two classifier architectures: Extreme Gradient Boosting (XGBoost) and a three-layer Multilayer Perceptron (MLP). Results show that LDA achieves strong separability at lower dimensions, while PCA performs best at higher dimensions where variance preservation is critical. Autoencoder-based reduction provides consistently high accuracy with compact representations, whereas UMAP exhibits limited benefits for tabular opcode data. The findings highlight trade-offs between linear and non-linear reduction strategies and provide guidance for selecting efficient feature compression methods in large-scale malware analysis.

Keywords

Dimensionality reduction, Machine learning, Malware classification

Department

Computer Science

Recommended Citation

Chandler Lu and Fabio Di Troia. "A Comparative Study of Linear and Non-Linear Dimensionality Reduction for Opcode-Frequency Malware Classification" Journal of Computer Virology and Hacking Techniques (2026). https://doi.org/10.1007/s11416-026-00597-1

Link to Full Text

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

A Comparative Study of Linear and Non-Linear Dimensionality Reduction for Opcode-Frequency Malware Classification

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

A Comparative Study of Linear and Non-Linear Dimensionality Reduction for Opcode-Frequency Malware Classification

Authors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links