TEMPORAL SEGMENTATION AND HAND GESTURE RECOGNITION USING MAMBA SSM ARCHITECTURE

Nguyễn, Phước Minh

Please use this identifier to cite or link to this item: https://dspace.ctu.edu.vn/jspui/handle/123456789/124144

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Lâm, Nhựt Khang	-
dc.contributor.author	Nguyễn, Phước Minh	-
dc.date.accessioned	2026-01-10T02:50:24Z	-
dc.date.available	2026-01-10T02:50:24Z	-
dc.date.issued	2025	-
dc.identifier.other	B2111936	-
dc.identifier.uri	https://dspace.ctu.edu.vn/jspui/handle/123456789/124144	-
dc.description	70 Tr	vi_VN
dc.description.abstract	Despite significant advancements in assistive technology, communication barriers remain a pervasive challenge for the hearing-impaired community, particularly regarding Vietnamese Sign Language (VSL). Existing recognition systems often face a trade-off dilemma: Recurrent Neural Networks (RNNs) struggle with vanishing gradients when modeling long gesture sequences, while Transformerbased models incur quadratic computational costs that hinder real-time deployment on edge devices. To address these limitations, this thesis proposes a comprehensive end-to-end framework leveraging the Mamba State Space Model (SSM), a novel architecture capable of capturing long-range temporal dependencies with linear computational complexity (𝑂(𝑁)), thereby bridging the gap between high accuracy and operational efficiency. The core recognition framework orchestrates two specialized Mamba-based modules: a Temporal Segmenter and a Gesture Classifier. Experimental results demonstrate that the Mamba Segmenter achieves a Mean Intersection over Union (mIoU) of 55.69%, outperforming the TCN baseline by over 14%, particularly in detecting ambiguous transition states. Furthermore, the Mamba Classifier attains a remarkable mAP of 0.9937, surpassing the Bi-LSTM baseline in both stability and inference speed. These results validate the efficacy of Mamba’s Selective Scan mechanism in filtering kinematic noise while retaining crucial semantic context. Beyond theoretical modeling, this study culminates in the deployment of a fully functional real-time application using ONNX Runtime and PyQt6. The system successfully translates continuous VSL streams into natural language text with low latency on standard consumer hardware. This practical implementation proves the feasibility of Mamba SSM as a lightweight, scalable solution for sign language recognition, laying a solid foundation for future large-scale dictionary expansion.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Trường Đại Học Cần Thơ	vi_VN
dc.subject	CÔNG NGHỆ THÔNG TIN - CHẤT LƯỢNG CAO	vi_VN
dc.title	TEMPORAL SEGMENTATION AND HAND GESTURE RECOGNITION USING MAMBA SSM ARCHITECTURE	vi_VN
dc.title.alternative	PHÂN ĐOẠN THỜI GIAN VÀ NHẬN DẠNG CỬ CHỈ TAY SỬ DỤNG KIẾN TRÚC MAMBA SSM	vi_VN
dc.type	Thesis	vi_VN
Appears in Collections:	Trường Công nghệ Thông tin & Truyền thông

Files in This Item:

File	Description	Size	Format
_file_ Restricted Access		2.35 MB	Adobe PDF
Your IP: 216.73.216.15

Show simple item record

LRC Digital repo

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets