Intelligent Semantic Extraction and Transformation Pipeline for Large-Scale Multimedia Data Processing

Shih Yu Chang, San Jose State University
Sourab Rajendra Saklecha, San Jose State University
Yiyan Wu, Western University

Abstract

This paper presents a comprehensive framework for integrating semantic technologies within the ETL (Extraction, Transformation, Loading) process to improve multimedia metadata extraction and enrichment. The proposed architectural design combines traditional ETL pipelines with advanced semantic techniques, allowing for precise extraction and transformation of metadata using ontology mappings. At the core of this framework are algorithms designed to extract metadata from multimedia sources, transform it based on ontological structures, and further enrich it by adding contextualized semantic information. The architecture is evaluated through a performance analysis focusing on two key metrics: Multimedia Metadata Extraction Accuracy and Semantic Enrichment Quality. Experimental results reveal significant accuracy improvements and higher enrichment quality when using semantically enriched ETL, underscoring the value of semantic technologies in optimizing data management and metadata processes for multimedia content.