EXTRACTING INFORMATION FROM REGISTRATION FORM FOR FOREIGN LANGUAGE PROFICIENCY EXAMINATIONS

Phạm, Đức Nguyên

Please use this identifier to cite or link to this item: https://dspace.ctu.edu.vn/jspui/handle/123456789/102043

Title:	EXTRACTING INFORMATION FROM REGISTRATION FORM FOR FOREIGN LANGUAGE PROFICIENCY EXAMINATIONS
Other Titles:	TRÍCH XUẤT THÔNG TIN TỪ PHIẾU ĐĂNG KÝ THI ĐÁNH GIÁ NĂNG LỰC NGOẠI NGỮ
Authors:	Lâm, Nhựt Khang Phạm, Đức Nguyên
Keywords:	CÔNG NGHỆ THÔNG TIN - CHẤT LƯỢNG CAO
Issue Date:	2024
Publisher:	Trường Đại Học Cần Thơ
Abstract:	This thesis focuses on overcoming the challenge of extracting candidate information from examination registration form, especially in contexts like competitive exams where processing large volumes of such forms efficiently is crucial. It proposes advanced Optical Character Recognition (OCR) techniques customized for the unique demands of examination data processing. In the topic: "Extracting information from foreign language proficiency test registration form" in addition to knowledge about image processing, the topic also focuses on research on word recognition using the CNN model combined with Transformer architecture and Training steps to increase recognition ability. In this project, we developed a lightweight OCR model specifically tailored for recognizing Vietnamese words and handwriting. Our model effectively decodes characters with high accuracy and speed, utilizing a blend of CNN, Transformer, and cross-entropy loss. These enhancements not only address the challenges of Vietnamese handwritten OCR but also hold promise for broader applications in image processing and computer vision.
Description:	54 Tr
URI:	https://dspace.ctu.edu.vn/jspui/handle/123456789/102043
Appears in Collections:	Trường Công nghệ Thông tin & Truyền thông

Files in This Item:

File	Description	Size	Format
_file_ Restricted Access		2.22 MB	Adobe PDF
Your IP: 216.73.216.102

Show full item record

LRC Digital repo

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets