Development of high-performance and large-scale Vietnamese automatic speech recognition systems

Do, Quoc Truong; Pham, Ngoc Phuong; Tran, Hoang Tung; Luong, Chi Mai

Please use this identifier to cite or link to this item: https://dspace.ctu.edu.vn/jspui/handle/123456789/10457

Full metadata record

DC Field	Value	Language
dc.contributor.author	Do, Quoc Truong	-
dc.contributor.author	Pham, Ngoc Phuong	-
dc.contributor.author	Tran, Hoang Tung	-
dc.contributor.author	Luong, Chi Mai	-
dc.date.accessioned	2019-07-31T02:03:12Z	-
dc.date.available	2019-07-31T02:03:12Z	-
dc.date.issued	2018	-
dc.identifier.issn	1813-9663	-
dc.identifier.uri	http://dspace.ctu.edu.vn/jspui/handle/123456789/10457	-
dc.description.abstract	Automatic Speech Recognition (ASR) systems convert human speech into corresponding transcription automatically. They have a wide range of application such as controlling robots, call center analytic, voice chatbot. Recent studies on ASR for English have achieved the performance that surpass human ability. The systems were trained on a large amount of training data and performed well under many environments. With regards to Vietnamese, there have been many studies on improving the performance of existing ASR systems, however, many of them are conducted on a small-scaled data, which does not reflect realistic scenarios. Although the corpora used to train the system were carefully design to maintain phonetic balance properties, efforts in collecting them at a large-scale is still limited. Specifically, only a certain accent of Vietnam was evaluated in existing works. In this paper, we first describe our efforts in collecting a large data set that covers all 3 major accents of Vietnam located in the Northern, Center, and Southern regions. Then, we detail our ASR system development procedure utilizing the collected data set and evaluating different model architectures to find the best structure for Vietnamese. In the VLSP 2018 challenge, our system achieved the best performance with 6,5% WER and on our internal test set with more than 10 hours of speech collected real environments, the system also performs well with 11% WER.	vi_VN
dc.language.iso	en	vi_VN
dc.relation.ispartofseries	Journal of Computer Science and Cybernetics;Vol.34(04) .- P.335–348	-
dc.subject	ASR	vi_VN
dc.subject	Automatic speech recognition	vi_VN
dc.subject	Vietnamese corpora	vi_VN
dc.subject	Vietnamese Speech recognition	vi_VN
dc.title	Development of high-performance and large-scale Vietnamese automatic speech recognition systems	vi_VN
dc.type	Article	vi_VN
Appears in Collections:	Tin học và Điều khiển học (Journal of Computer Science and Cybernetics)

Files in This Item:

File	Description	Size	Format
_file_		5 MB	Adobe PDF	View/Open
Your IP: 216.73.216.173

Show simple item record

LRC Digital repo

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets