IMAGE CAPTIONING USING INCEPTION-V4, LSTM MODELS AND BERT EMBEDDINGS

Huỳnh, Quang Nhật Hào

Please use this identifier to cite or link to this item: https://dspace.ctu.edu.vn/jspui/handle/123456789/84781

Title:	IMAGE CAPTIONING USING INCEPTION-V4, LSTM MODELS AND BERT EMBEDDINGS
Other Titles:	XÂY DỰNG CÂU MÔ TẢ CHO HÌNH ẢNH SỬ DỤNG MÔ HÌNH INCEPTION-V4, LSTM VÀ BERT EMBEDDINGS
Authors:	Lâm, Nhựt Khang Huỳnh, Quang Nhật Hào
Keywords:	CÔNG NGHỆ THÔNG TIN - CHẤT LƯỢNG CAO
Issue Date:	2022
Publisher:	Trường Đại Học Cần Thơ
Abstract:	Image captioning uses image recognition techniques and natural language processing models to generate captions of photos. In this thesis, we perform experiments with several models to automatically create descriptions for images using the Inception-v4, LSTM Models and BERT Embeddings. In particular, the Inception-v4 model extracts image features later fed into the LSTM and BERT Embeddings model to generate image captions. We perform experiments on the Flickr8k dataset in English and Vietnamese and evaluate the models using the BLEU metric. The experimental results show that combining the Inception-v4, LSTM Models and BERT Embeddings helps achieve better BLEU scores than others. The experimental results show that the combination of Inception-v4, LSTM Models and BERT Embeddings help achieve better BLEU scores than other models. The BLEU1, 2, 3, and 4 scores of the Inception-v4, LSTM Models and BERT Embeddings on the English and Vietnamese Flickr8k datasets are 0.689, 0.479, 0.3649, 0.267; and 0.647, 0.501, 0.332, 0.271 respectively.
Description:	42 Tr
URI:	https://dspace.ctu.edu.vn/jspui/handle/123456789/84781
Appears in Collections:	Trường Công nghệ Thông tin & Truyền thông

Files in This Item:

File	Description	Size	Format
_file_ Restricted Access		1.56 MB	Adobe PDF
Your IP: 216.73.216.213

Show full item record

LRC Digital repo

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets