Please use this identifier to cite or link to this item: https://dspace.ctu.edu.vn/jspui/handle/123456789/83368
Title: VIETNAMESE ABSTRACTIVE TEXT SUMMARIZATION USING A PRE-TRAINED MODEL
Authors: Lâm, Nhựt Khang
Nguyễn, Phúc Hoàng Long
Keywords: CÔNG NGHỆ THÔNG TIN - CHẤT LƯỢNG CAO
Issue Date: 2021
Publisher: Trường Đại Học Cần Thơ
Abstract: Nowadays, there has been a huge amount of data being gathered and used every single day. In fact, the International Data Corporation (IDC) projects that the total amount of digital data circulating annually around the world would sprout from 4.4 zettabytes in 2013 to hit 180 zettabytes in 2025, which is a huge number! As we can see, with such a large amount of data collected, we need to find a way to automatically shorten longer texts and deliver accurate summaries that can give us all the information we need in this huge data. Furthermore, applying text summarization reduces reading time, accelerates the process of researching specific information, as well as effectiveness. To do so, lots of Natural Languages Processing (NLP) models were introduced. In this thesis, with the help of Pre-training with Extracted Gap-sentences for Abstractive Summarization Sequence-to-sequence models, or PEGASUS, one of State Of The Art techniques, we will implement it to solve the summarization task mentioned above. Since PEGASUS models are trained with sampled gap sentence ratios on both C4 and HugeNews, they can only be used to predict summaries in English. We will make it work in Vietnamese by adding a new pre-trained model and using PEGASUS to fine-tune it. The result of this thesis introduces an abstractive Vietnamese summarization model, which enables users to gain a summary of the source text.
Description: 44 Tr
URI: https://dspace.ctu.edu.vn/jspui/handle/123456789/83368
Appears in Collections:Trường Công nghệ Thông tin & Truyền thông

Files in This Item:
File Description SizeFormat 
_file_
  Restricted Access
7.38 MBAdobe PDF
Your IP: 3.145.164.228


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.