Gun Violence News Information Retrieval using BERT as Sequence Tagging Task
Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021
The growth in both frequency and severity of gun violence in the United States has necessitated increased research into prevention, despite the lack of funding. Comprising more than 60k gun violence media articles with a total data size of 520 MB, the gun violence database (GVDB) was developed to assist natural language processing researchers in developing and testing prevention methods. Original research based on the GVDB utilized a span-selection model to extract shooter and victim information, but their works might potentially trim out important span candidates. We proposed a new approach to improve identification accuracy and recognize every token in a sentence using a sequence tagging technique. We implemented a BIO sequence tagging model at the token-level using BERT, then further classified each token using LSTM, BiLSTM, and CRF. We found that utilizing BERT as an embedding layer, and decoding word representation as a sequence tagging task, improved shooter/victim identification compared to a span-selection model. We believe that if this improved model is combined with gun violence related keywords, automated techniques could be implemented to identify precursors/risks to gun violence on social media, allowing for intervention by law enforcement or community agencies before escalation to deaths.
BERT, BiLSTM, CRF, gun violence, natural language processing, NLP, sequence tagging, transformer
Computer Science; Justice Studies
Hung Yeh Lin, Teng Sheng Moh, and Bryce Westlake. "Gun Violence News Information Retrieval using BERT as Sequence Tagging Task" Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 (2021): 2525-2531. https://doi.org/10.1109/BigData52589.2021.9671919