VIETNAMESE ABSTRACTIVE TEXT SUMMARIZATION USING A PRE-TRAINED MODEL

Nguyễn, Phúc Hoàng Long

Vui lòng dùng định danh này để trích dẫn hoặc liên kết đến tài liệu này: https://dspace.ctu.edu.vn/jspui/handle/123456789/83368

Nhan đề:	VIETNAMESE ABSTRACTIVE TEXT SUMMARIZATION USING A PRE-TRAINED MODEL
Tác giả:	Lâm, Nhựt Khang Nguyễn, Phúc Hoàng Long
Từ khoá:	CÔNG NGHỆ THÔNG TIN - CHẤT LƯỢNG CAO
Năm xuất bản:	2021
Nhà xuất bản:	Trường Đại Học Cần Thơ
Tóm tắt:	Nowadays, there has been a huge amount of data being gathered and used every single day. In fact, the International Data Corporation (IDC) projects that the total amount of digital data circulating annually around the world would sprout from 4.4 zettabytes in 2013 to hit 180 zettabytes in 2025, which is a huge number! As we can see, with such a large amount of data collected, we need to find a way to automatically shorten longer texts and deliver accurate summaries that can give us all the information we need in this huge data. Furthermore, applying text summarization reduces reading time, accelerates the process of researching specific information, as well as effectiveness. To do so, lots of Natural Languages Processing (NLP) models were introduced. In this thesis, with the help of Pre-training with Extracted Gap-sentences for Abstractive Summarization Sequence-to-sequence models, or PEGASUS, one of State Of The Art techniques, we will implement it to solve the summarization task mentioned above. Since PEGASUS models are trained with sampled gap sentence ratios on both C4 and HugeNews, they can only be used to predict summaries in English. We will make it work in Vietnamese by adding a new pre-trained model and using PEGASUS to fine-tune it. The result of this thesis introduces an abstractive Vietnamese summarization model, which enables users to gain a summary of the source text.
Mô tả:	44 Tr
Định danh:	https://dspace.ctu.edu.vn/jspui/handle/123456789/83368
Bộ sưu tập:	Trường Công nghệ Thông tin & Truyền thông

Các tập tin trong tài liệu này:

Tập tin	Mô tả	Kích thước	Định dạng
_file_ Giới hạn truy cập		7.38 MB	Adobe PDF
Your IP: 216.73.216.213

Hiển thị đầy đủ biểu ghi tài liệu Xem thống kê

Khi sử dụng các tài liệu trong Thư viện số phải tuân thủ Luật bản quyền.

Thư viện số DSPACE

Thư viện số cho phép quản lý các nguồn tài liệu số như: Văn bản, hình ảnh, âm thanh, phim ảnh...