Please use this identifier to cite or link to this item:
https://dspace.ctu.edu.vn/jspui/handle/123456789/110697
Title: | IMPROVING IMAGE CAPTION USING CLIP |
Other Titles: | CẢI THIỆN CHÚ THÍCH HÌNH ẢNH SỬ DỤNG CLIP |
Authors: | Trần, Công Án Dương, Thị Yến Nhi |
Keywords: | CÔNG NGHỆ THÔNG TIN - CHẤT LƯỢNG CAO |
Issue Date: | 2024 |
Publisher: | Trường Đại Học Cần Thơ |
Abstract: | Image captioning, the task of generating textual descriptions for visual content, has seen significant advancements with the integration of pre-trained vision-language models. This work explores the application of CLIP’s robust cross-modal embeddings in a CLIPbased captioning framework. The proposed method employs CLIP as a foundational model and fine-tunes a lightweight transformer-based decoder on top of CLIP embeddings. By retaining the pre-trained weights of CLIP and adjusting only the "Prefix" and "Decoder" modules, the framework ensures efficient and contextually rich caption generation. The model is evaluated using standard datasets to assess its performance. The integration of CLIP-based embeddings addresses the limitations of traditional image captioning models, such as the need for extensive task-specific training. By exploiting pre-trained representations, this approach reduces computational requirements while enhancing descriptive accuracy and semantic relevance. The method achieves competitive results on standard metrics like CIDEr, BLEU, and SPICE, demonstrating substantial improvements in caption quality and relevance. This research highlights the potential of CLIP-based architectures for building efficient and high-performing image captioning systems. Secifically, the ROUGE-L, CIDEr, SPICE and training time of CLIP + GPT2 using Conceptual captions are 26.71, 87.26, 18.5 and 65 hours. For COCO Captions the B@4, METER, CIDEr, SPICE and training time of CLIP + GPT2; transformer are 33.53, 28.43, 113.08, 21.05 and 6 hours. |
Description: | 37 Tr |
URI: | https://dspace.ctu.edu.vn/jspui/handle/123456789/110697 |
Appears in Collections: | Trường Công nghệ Thông tin & Truyền thông |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
_file_ Restricted Access | 1.08 MB | Adobe PDF | ||
Your IP: 3.145.60.120 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.