Author

Suyash Pasari

Publication Date

Fall 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Fabio Di Troia

Second Advisor

William Andreopoulos

Third Advisor

Ravi Teja

Keywords

Encrypted Traffic Detection, HTTPS Malware, TLS Metadata, X.509 Features, Machine Learning, Zeek Logs

Abstract

Encrypted HTTPS traffic now dominates the Internet, and malware increasingly uses TLS to conceal command-and-control activity. Since payloads cannot be inspected, detection must rely on metadata such as TLS handshake fields and certificate attributes, which prior work has shown can still reveal malicious behavior. This research evaluates whether malicious HTTPS connections can be detected using only metadata from Zeek logs. Using the CTU-SME-11 dataset, we build a reproducible preprocessing pipeline and a 33-feature connection-level representation capturing flow statistics, TLS behavior, and certificate validity characteristics. We evaluate XGBoost, multilayer perceptrons, and several CNN variants - including 1D and 2D grid-based embeddings - using a stratified capture-level split and 5-fold capture-aware cross-validation to prevent leakage. Results show strong discriminative performance, with XGBoost achieving the highest ROC-AUC and PR-AUC, and CNN-based models, particularly an 8×8 architecture, achieving the strongest malicious-class F1-scores. These findings show that metadata-based models can accurately detect encrypted malicious traffic and motivate future work on generalization, calibration and explainability.

Available for download on Saturday, December 19, 2026

Share

COinS