Towards Detecting and Quantifying Identity-Based Polarization in Online Content: A Deep-Learning Approach

Publication Date

1-1-2023

Document Type

Conference Proceeding

Publication Title

Proceedings - 2023 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023

DOI

10.1109/WI-IAT59888.2023.00098

First Page

593

Last Page

599

Abstract

Identity-based polarization is prevalent across many news outlets, and yet goes unnoticed by most readers. This separate form of polarization differs greatly from the typical ideology-based polarization, which is seen when two separate political parties solely disagree due to a policy difference. In this paper, we focus on a distinct kind of polarization that we call identity-based polarization. This is the act of refusing to focus on differences in ideology and instead focusing on individual or group identities such as race, sexuality and gender to form extreme opinions. The never-ending stream of online content through social media and news outlets has posed significant challenges to accurately understand, detect, and quantify identity-based polarization. If one cannot quantify, one cannot manage, and the health of our society, especially for the younger generation, suffers greatly. In the past, research has been done in detecting polarization in areas such as political bias, but such studies solely relied on sentiment analysis. In this paper, we take a vastly different approach as we first implement an entity-recognition system to detect identities in the article, and then as a follow up, we attempt to attribute polarization to the recognized entities using sentiment analysis. In addition, this paper leverages BERT (Bidirectional Encoder Representations from Transformers) and NLP (Natural Language Processing) techniques through the combination of sentiment analysis and a customized NER (Named-Entity-Recognition) system. In this way, this paper takes a novel and more scientific approach to accurately identifying and quantifying identity-based polarization. By utilizing news article data from five major news outlets, the final developed NER model achieved an F1 score of over 83%, and we uncovered a strong correlation between identity density in texts and extreme sentiment expressed in the text, with the correlation ranging from +0.48 to +0.68. This work is a first step towards managing identity-based polarization in online contents; it has significant societal and cultural implications, and may be readily extended to other kinds of polarization and be applied to other public media.

Keywords

BERT, correlation, identity-based polarization, media bias, news articles, Nlp, sentiment analysis

Department

Computer Science

Share

COinS