Finding BERT Errors Using Activation Vectors

Publication Date

1-1-2023

Document Type

Conference Proceeding

Publication Title

Proceedings - IEEE 9th International Conference on Big Data Computing Service and Applications, BigDataService 2023

DOI

10.1109/BigDataService58306.2023.00010

First Page

25

Last Page

32

Abstract

The internal workings of deep learning models are opaque and are considered as black boxes due to their nested and non-linear structure. This opaque nature of the deep neural networks makes it difficult to interpret the reason behind their output, thus reducing trust and verifiability of the system where these models are applied. This paper explains a systematic approach to identify the clusters with most misclassifications or false label annotations. For this research, we extracted the activation vectors from a deep learning model, DNABERT, and visualized them using t-SNE to decode the reason behind the results that are produced. We applied K-means in a hierarchical fashion on the activation vectors for a set of training instances. We analyzed cluster mean activation vectors to find any patterns in the errors across K-means clusters. The cluster analysis revealed that the predictions were uniform, or nearly 100 percent same, in clusters of similar activation vectors. It was found that two clusters containing most of their objects belonging to the same true class tend to be closer together than clusters of opposite classes. The means of objects of the same true label are closer if two clusters have the same predicted labels rather than opposite predicted labels, showing that the activation vectors reflect both predicted and true classes. We propose a heuristic to find the clusters with a high number of misclassifications or incorrect label annotations using the between clusters and within clusters mean vector analysis. This can aid in identifying misclassifications of DNA sequences or problems with sequence tagging.

Keywords

activation vectors, attention, BERT, clustering, K-means, t-SNE, transformer

Department

Computer Science

Share

COinS