InvadeAI: INteractive ADvertising with Multimodal AI

Publication Date

1-1-2025

Document Type

Conference Proceeding

Publication Title

Proceedings 2025 IEEE Conference on Artificial Intelligence Cai 2025

DOI

10.1109/CAI64502.2025.00027

First Page

126

Last Page

131

Abstract

In this paper, we introduce INvadeAI, a novel interactive advertising framework that leverages multimodal AI to monetize Large Language Models (LLMs) while enhancing digital advertising experiences. Our multi-stage framework integrates YOLOv8 for product detection in visual media, BLIP3 and OCR for caption generation and text extraction, and multimodal LLMs for product link retrieval. Unified within an interactive user interface, INvadeAI enables seamless product discovery and purchasing, creating a potential new revenue stream for computationally intensive AI systems. We applied a fine-tuned YOLOv8 model trained on 25 custom product classes-achieving mAP@ 50 of 0.535-alongside the base YOLOv8 model's 64 product classes, enabling detection of products across 89 consumer categories. We integrated BLIP3 for image captioning, achieving an average CLIPScore of 0.2852, and utilized advanced LLMs for product link retrieval. While our implementation of LLaMA3 achieved up to 0.97 accuracy in matching products to their purchase sources, ablation studies revealed superior product link retrieval performance when bypassing BLIP3 and OCR integration. This suggests that advanced LLMs can effectively handle multimodal tasks directly. Through our system, we demonstrate a practical application for monetizing advanced AI models by creating value in the e-commerce ecosystem, transforming passive content consumption into revenue-generating opportunities through contextually relevant, non-intrusive product discovery.

Keywords

Image Captioning, Interactive Advertising, LLMs, Object Detection, Optical Character Recognition (OCR)

Department

Applied Data Science

Share

COinS