Faculty Research, Scholarly, and Creative Activity

Clinically Aligned Long-Context Transformers for Cross-Platform Mental Health Risk Detection

Publication Date

3-27-2026

Document Type

Article

Publication Title

Electronics Switzerland

Volume

Issue

DOI

10.3390/electronics15071403

Abstract

Social media platforms contain rich but noisy narratives of psychological distress, creating opportunities for early mental health risk detection. However, existing datasets capture heterogeneous constructs such as suicide risk severity, depression diagnosis, and DSM-5 symptom presence, and most prior models are trained and evaluated on a single corpus, limiting their clinical alignment and cross-dataset generalizability. In this study, we fine-tune a domain-specific long-document transformer, AIMH/Mental-Longformer-base-4096, for binary mental health risk detection (risk vs. no risk) using two clinically aligned Reddit datasets: the C-SSRS Reddit corpus and the eRisk 2025 depression dataset. To handle long user histories, we introduce an LLM-based summarization pipeline that compresses posts exceeding 2000 tokens while preserving mental health-relevant information. We also conduct a seven-configuration ablation study across combinations of three corpora (C-SSRS, eRisk, and ReDSM5) to examine how dataset semantics influence model performance. On a held-out C-SSRS + eRisk test set (n = 279), the proposed model achieves a mean balanced accuracy of 0.89 ± 0.01 across five random seeds, with a best run of 0.90 and a 5.74 percentage point improvement over the strongest baseline (TF-IDF + Random Forest). The model also shows strong cross-platform generalization, achieving BA = 0.78 on the depression-reddit-cleaned dataset (n = 7731) and BA = 0.85 (ROC-AUC = 0.92) on a Twitter suicidal-intention dataset (n = 9119) without additional fine-tuning. The ablation analysis shows that although a three-dataset configuration (C-SSRS + eRisk + ReDSM5) maximizes aggregate performance, the ReDSM5 labels encode symptom presence rather than clinical risk, creating a semantic mismatch. This finding highlights the importance of label compatibility when combining heterogeneous mental health corpora. Explainability analysis using Integrated Gradients and attention visualization shows that the model focuses on clinically meaningful expressions such as therapy references, diagnosis, and hopelessness rather than isolated keywords. These results demonstrate that clinically aligned long-context transformers can provide accurate and interpretable mental health risk detection from social media while emphasizing the critical role of dataset semantics in multi-corpus training.

Keywords

C-SSRS, depression, DSM-5, encoder fine-tuning, eRisk, explainable AI, Integrated Gradients, long-document classification, Longformer, mental health, Reddit, social media, suicide risk, transformer, Twitter, zero-shot transfer

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Applied Data Science

Recommended Citation

Aditya Tekale and Mohammad Masum. "Clinically Aligned Long-Context Transformers for Cross-Platform Mental Health Risk Detection" Electronics Switzerland (2026). https://doi.org/10.3390/electronics15071403

Download

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

Clinically Aligned Long-Context Transformers for Cross-Platform Mental Health Risk Detection

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

Abstract

Keywords

Creative Commons License

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Clinically Aligned Long-Context Transformers for Cross-Platform Mental Health Risk Detection

Authors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

Abstract

Keywords

Creative Commons License

Department

Recommended Citation

Share

Search

Browse All

Links