Integrating image features with convolutional sequence-to-sequence network for multilingual visual question answering

Vui lòng dùng định danh này để trích dẫn hoặc liên kết đến tài liệu này: https://dspace.ctu.edu.vn/jspui/handle/123456789/119552

Nhan đề:	Integrating image features with convolutional sequence-to-sequence network for multilingual visual question answering
Tác giả:	Triet, M. Thai Son, T. Luu
Từ khoá:	Visual question answering Sequence-to-sequence learning Multilingual Multimodal
Năm xuất bản:	2024
Tùng thư/Số báo cáo:	Journal of Computer Science and Cybernetics;Vol.40, No.02 .- P.117-134
Tóm tắt:	Visual question answering is a task that requires computers to give correct answers for the input questions based on the images. This task can be solved by humans with ease, but it is a challenge for computers. The VLSP2022-EVJVQA shared task carries the Visual question answering task in the multilingual domain on a newly released dataset UIT-EVJVQA, in which the questions and answers are written in three different languages: English, Vietnamese, and Japanese. We approached the challenge as a sequence-to-sequence learning task, in which we integrated hints from pre-trained state-of-the-art VQA models and image features with a convolutional sequence-to-sequence network to generate the desired answers. Our results obtained up to 0.3442 by F1 score on the public test set and 0.4210 on the private test set. Bộ sưu tập: Journal of Computer Science and Cybernetics.
Định danh:	https://dspace.ctu.edu.vn/jspui/handle/123456789/119552
ISSN:	1813-9663
Bộ sưu tập:	Tin học và Điều khiển học (Journal of Computer Science and Cybernetics)

Các tập tin trong tài liệu này:

Tập tin	Mô tả	Kích thước	Định dạng
_file_ Giới hạn truy cập		2.01 MB	Adobe PDF
Your IP: 216.73.216.102

Khi sử dụng các tài liệu trong Thư viện số phải tuân thủ Luật bản quyền.

Thư viện số DSPACE