Please use this identifier to cite or link to this item: https://dspace.ctu.edu.vn/jspui/handle/123456789/124144
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorLâm, Nhựt Khang-
dc.contributor.authorNguyễn, Phước Minh-
dc.date.accessioned2026-01-10T02:50:24Z-
dc.date.available2026-01-10T02:50:24Z-
dc.date.issued2025-
dc.identifier.otherB2111936-
dc.identifier.urihttps://dspace.ctu.edu.vn/jspui/handle/123456789/124144-
dc.description70 Trvi_VN
dc.description.abstractDespite significant advancements in assistive technology, communication barriers remain a pervasive challenge for the hearing-impaired community, particularly regarding Vietnamese Sign Language (VSL). Existing recognition systems often face a trade-off dilemma: Recurrent Neural Networks (RNNs) struggle with vanishing gradients when modeling long gesture sequences, while Transformerbased models incur quadratic computational costs that hinder real-time deployment on edge devices. To address these limitations, this thesis proposes a comprehensive end-to-end framework leveraging the Mamba State Space Model (SSM), a novel architecture capable of capturing long-range temporal dependencies with linear computational complexity (𝑂(𝑁)), thereby bridging the gap between high accuracy and operational efficiency. The core recognition framework orchestrates two specialized Mamba-based modules: a Temporal Segmenter and a Gesture Classifier. Experimental results demonstrate that the Mamba Segmenter achieves a Mean Intersection over Union (mIoU) of 55.69%, outperforming the TCN baseline by over 14%, particularly in detecting ambiguous transition states. Furthermore, the Mamba Classifier attains a remarkable mAP of 0.9937, surpassing the Bi-LSTM baseline in both stability and inference speed. These results validate the efficacy of Mamba’s Selective Scan mechanism in filtering kinematic noise while retaining crucial semantic context. Beyond theoretical modeling, this study culminates in the deployment of a fully functional real-time application using ONNX Runtime and PyQt6. The system successfully translates continuous VSL streams into natural language text with low latency on standard consumer hardware. This practical implementation proves the feasibility of Mamba SSM as a lightweight, scalable solution for sign language recognition, laying a solid foundation for future large-scale dictionary expansion.vi_VN
dc.language.isoenvi_VN
dc.publisherTrường Đại Học Cần Thơvi_VN
dc.subjectCÔNG NGHỆ THÔNG TIN - CHẤT LƯỢNG CAOvi_VN
dc.titleTEMPORAL SEGMENTATION AND HAND GESTURE RECOGNITION USING MAMBA SSM ARCHITECTUREvi_VN
dc.title.alternativePHÂN ĐOẠN THỜI GIAN VÀ NHẬN DẠNG CỬ CHỈ TAY SỬ DỤNG KIẾN TRÚC MAMBA SSMvi_VN
dc.typeThesisvi_VN
Appears in Collections:Trường Công nghệ Thông tin & Truyền thông

Files in This Item:
File Description SizeFormat 
_file_
  Restricted Access
2.35 MBAdobe PDF
Your IP: 216.73.216.105


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.