OD-VR-Cap: Image captioning based on detecting and predicting relationships between objects

Nguyen, Van Thinh; Tran, Van Lang; Van, The Thanh

Please use this identifier to cite or link to this item: https://dspace.ctu.edu.vn/jspui/handle/123456789/119563

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nguyen, Van Thinh	-
dc.contributor.author	Tran, Van Lang	-
dc.contributor.author	Van, The Thanh	-
dc.date.accessioned	2025-07-31T02:05:09Z	-
dc.date.available	2025-07-31T02:05:09Z	-
dc.date.issued	2024	-
dc.identifier.issn	1813-9663	-
dc.identifier.uri	https://dspace.ctu.edu.vn/jspui/handle/123456789/119563	-
dc.description.abstract	Recent image captioning works often focus on global features or individual object regions within the image without exploiting the relational information between them, resulting in limited accuracy. In this paper, the proposed image captioning model leverages the relationships between objects in the image to fully understand the content and improve accuracy. The approach goes through the following steps: First, objects in the image are detected using an object detection model combined with a graph convolutional network (GCN). From this, a relationship prediction model based on relational context information and knowledge is proposed to classify relationships between objects to create a relationship graph to represent the image. Subsequently, a dual attention mechanism is built to enable the model to focus on relevant parts of both object regions and vertices in the relationship graph when generating captions. Finally, an LSTM network with dual attention is trained to generate captions relying on the image representation and given captions. Experiments conducted on MS COCO and Visual Genome datasets demonstrate that the proposed model achieves higher accuracy compared to baseline methods and some recently published works. Bộ sưu tập: Journal of Computer Science and Cybernetics.	vi_VN
dc.language.iso	en	vi_VN
dc.relation.ispartofseries	Journal of Computer Science and Cybernetics;Vol.40, No.04 .- P.327-346	-
dc.subject	Image captioning	vi_VN
dc.subject	Object detection	vi_VN
dc.subject	Visual relationship	vi_VN
dc.subject	Attention mechanism	vi_VN
dc.subject	Deep neural network	vi_VN
dc.title	OD-VR-Cap: Image captioning based on detecting and predicting relationships between objects	vi_VN
dc.type	Article	vi_VN
Appears in Collections:	Tin học và Điều khiển học (Journal of Computer Science and Cybernetics)

Files in This Item:

File	Description	Size	Format
_file_ Restricted Access		953.32 kB	Adobe PDF
Your IP: 216.73.216.219

Show simple item record

LRC Digital repo

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets