DEVELOPING A WEB APPLICATION FOR MUSIC GENERATION FROM VIDEO USING NATURAL LANGUAGE AS INTERMEDIARY USING OPEN SOURCE AI MODELS

Phan, Trung Thuận

Please use this identifier to cite or link to this item: https://dspace.ctu.edu.vn/jspui/handle/123456789/124291

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Lâm, Nhựt Khang	-
dc.contributor.author	Phan, Trung Thuận	-
dc.date.accessioned	2026-01-12T08:36:39Z	-
dc.date.available	2026-01-12T08:36:39Z	-
dc.date.issued	2025	-
dc.identifier.other	B2111957	-
dc.identifier.uri	https://dspace.ctu.edu.vn/jspui/handle/123456789/124291	-
dc.description	61 Tr	vi_VN
dc.description.abstract	This report presents a language-mediated framework for video-to-music generation, aiming to automatically generate background music that is semantically aligned with video content. Unlike conventional approaches that directly fuse visual and audio features, the proposed system employs natural language as an intermediate representation to bridge video understanding and music generation. The framework integrates scene segmentation, video captioning, music feature inference, and prompt-based music generation using open-source models, enabling improved interpretability and controllability. Experimental evaluation is conducted on a subset of the SymMV dataset, with vocals removed to focus on background music. The system is assessed using both audio quality metrics and cross-modal video-music relationship metrics based on ImageBind embeddings. Results show that the proposed approach achieves strong global semantic alignment and consistent temporal correspondence between video and generated music, outperforming a state-ofthe-art baseline in several semantic alignment metrics. Although beat-level synchronization remains limited, the generated music exhibits stable spectral characteristics suitable for background accompaniment. Overall, the results demonstrate that natural language serves as an effective intermediary modality for semantically coherent video-to-music generation in web-based and interactive applications.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Trường Đại Học Cần Thơ	vi_VN
dc.subject	CÔNG NGHỆ THÔNG TIN - CHẤT LƯỢNG CAO	vi_VN
dc.title	DEVELOPING A WEB APPLICATION FOR MUSIC GENERATION FROM VIDEO USING NATURAL LANGUAGE AS INTERMEDIARY USING OPEN SOURCE AI MODELS	vi_VN
dc.title.alternative	PHÁT TRIỂN ỨNG DỤNG WEB CHO TÁC VỤ SINH NHẠC TỪ VIDEO SỬ DỤNG NGÔN NGỮ TỰ NHIÊN LÀM TRUNG GIAN BẰNG CÁC MÔ HÌNH AI MÃ NGUỒN MỞ.	vi_VN
Appears in Collections:	Trường Công nghệ Thông tin & Truyền thông

Files in This Item:

File	Description	Size	Format
_file_ Restricted Access		1.26 MB	Adobe PDF
Your IP: 216.73.216.15

Show simple item record

LRC Digital repo

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets