Publication Date

Summer 2024

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering

Advisor

Jorjeta Jetvhava; Carlos Rojas; Gheorghi Guzun

Abstract

Recent advancements in natural language processing (NLP) and large language models (LLMs) have facilitated the development of systems capable of generating human-like responses across a wide range of tasks. However, the majority of research has focused predominantly on English, overlooking the vast linguistic diversity globally. For true global inclusivity, extending research to other languages is crucial, particularly as it can significantly benefit various sectors such as business, healthcare, government, and education. A major challenge in this expansion is the scarcity of digital data available and the limited number of pre-trained models for low-resource languages. Our research specifically addresses these challenges by focusing on improving cross-cultural communication through the development of conversation agents for low-resource Indian languages, namely Hindi, Gujarati, Marathi, and Bengali. We propose an approach of integrating a translation model within the conversational model pipeline, aiming to enhance the conversational agent’s capabilities in these languages. Employing techniques such as transfer learning and zero-shot multilingual applications, we try to improve the translation quality of our conversational agent. Using a transfer learning approach, we achieved quantifiable enhancements in translation and improved language comprehension through the zero-shot multilingual method. Additionally, human evaluation of the system has yielded positive feedback, confirming the effectiveness of our models in facilitating more inclusive and accessible communication across different linguistic and cultural contexts.

Share

COinS