Publication Date
Fall 2023
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Engineering
Advisor
Jorjeta Jetcheva; Magdalini Eirinaki; Mahima Agumbe Suresh
Abstract
Presentation slides are widely used for conveying information in academic and professional contexts. However, manual slide creation can be time-consuming. Our research focuses on automated slide generation, specifically for scientific research papers. Automating the creation of presentation slides for scientific documents is a rather novel task and hence, there’s limited training data available and there also exists the token constraints of language models like BERT, with a maximum sequence length of 512 tokens. In this study, we fine-tune large language models, including Longformer-Encoder-Decoder (supporting sequences up to 16,834 tokens) and BIGBIRD-Pegasus (supporting sequences up to 4,096 tokens). We tackle this task using two approaches, one based on abstractive summarization and other on the hybrid summarization approaches. We use one of the largest dataset available for automatic slide generation of scientific document scientific papers i.e., PS5K. Our research shows that a model supporting a longer maximum sequence length when working with entire documents performs better. This approach yielded superior results, particularly when the model was trained on section-slide pairs, showcasing higher R2 and RL scores, indicating enhanced coherence compared to other experiments.
Recommended Citation
Gupta, Tanya, "Automatic Presentation Slide Generation Using LLMs" (2023). Master's Theses. 5444.
DOI: https://doi.org/10.31979/etd.2x3f-w7uv
https://scholarworks.sjsu.edu/etd_theses/5444