Classifying Speech Acts using Multi-channel Deep Attention Network for Task-oriented Conversational Search Agents

Publication Date


Document Type

Conference Proceeding

Publication Title

CHIIR 2021 - Proceedings of the 2021 Conference on Human Information Interaction and Retrieval



First Page


Last Page



Understanding human spoken dialogues in an information-seeking scenario is a significant challenge for IR researchers. Prior literature in intelligent systems suggests that by identifying speech acts in spoken dialogues, we can identify the search intent and the information needs of the user. Therefore, in this paper, we have used speech acts to address the problem of natural language understanding in conversational search systems. First, we collected human-system interaction data through a Wizard-of-Oz study. Next, we developed a gold-standard dataset where the human-system conversations are labeled with corresponding speech acts. Finally, we built the Multi-channel Deep Attention Network (MDAN) to identify the speech acts in information-seeking dialogues. The results highlight that the best performing model predicts speech acts with 90.2% accuracy. The MDAN architecture outperforms not only all traditional machine learning models but also the state-of-the-art single-channel BERT by 3.3 absolute points. We performed ablation analysis to show the impact of the three channels of MDAN individually and in combination. The results indicate that the best performance is achieved using all three channels for speech act prediction.


conversational information retrieval, conversational search systems, deep neural network, intelligent personal assistants, speech acts, spoken search