AI-based visual speech recognition towards realistic avatars and lip-reading applications in the metaverse

Li, Ying; Hashim, Ahmad Sobri; Lin, Yun; Nohuddin, Puteri N. E.; Venkatachalam, K.; Ahmadian, Ali

AI-based visual speech recognition towards realistic avatars and lip-reading applications in the metaverse

dc.authorscopusid	59179509100
dc.authorscopusid	37041287000
dc.authorscopusid	59211781400
dc.authorscopusid	36183861000
dc.authorscopusid	57213001965
dc.authorscopusid	55602202100
dc.authorwosid	Nohuddin, Puteri Nor Ellyza/GWV-2366-2022
dc.contributor.author	Li, Ying
dc.contributor.author	Hashim, Ahmad Sobri
dc.contributor.author	Lin, Yun
dc.contributor.author	Nohuddin, Puteri N. E.
dc.contributor.author	Venkatachalam, K.
dc.contributor.author	Ahmadian, Ali
dc.date.accessioned	2024-09-11T07:41:10Z
dc.date.available	2024-09-11T07:41:10Z
dc.date.issued	2024
dc.department	Okan University	en_US
dc.department-temp	[Li, Ying] Guizhou Univ, Coll Int Educ, Guiyang 550025, Guizhou, Peoples R China; [Hashim, Ahmad Sobri] Univ Teknol PETRONAS, Comp & Informat Sci Dept, Seri Iskandar 32610, Perak, Malaysia; [Lin, Yun] Zhejiang Expressway Intelligent Toll Collect Oper, Hangzhou 310000, Peoples R China; [Nohuddin, Puteri N. E.] Univ Kebangsaan Malaysia, Inst Visual Informat, Bangi, Selangor, Malaysia; [Nohuddin, Puteri N. E.] Univ Kebangsaan Malaysia, iAI Res Grp, Bangi, Selangor, Malaysia; [Venkatachalam, K.] Univ Hradec Kalove, Fac Sci, Dept Appl Cybernet, Hradec Kralove 50003, Czech Republic; [Ahmadian, Ali] Istanbul Okan Univ, Fac Engn & Nat Sci, Istanbul, Turkiye	en_US
dc.description.abstract	The metaverse, a virtually shared digital world where individuals interact, create, and explore, has witnessed rapid evolution and widespread adoption. Communication between avatars is crucial to their actions in the metaverse. Advances in natural language processing have allowed for significant progress in producing spoken conversations. Within this digital landscape, the integration of Visual Speech Recognition (VSR) powered by deep learning emerges as a transformative application. This research delves into the concept and implications of VSR in the metaverse. This study focuses on developing realistic avatars and a lip-reading application within the metaverse, utilizing Artificial Intelligence (AI) techniques for visual speech recognition. Visual Speech Recognition in the metaverse refers to using deep learning techniques to comprehend and respond to spoken language, relying on the visual cues provided by users' avatars. This multidisciplinary approach combines computer vision and natural language processing, enabling avatars to understand spoken words by analyzing the movements of their lips and facial expressions. Key components encompass the collection of extensive video datasets, the employment of 3D Convolutional Neural Networks (3D CNNs) combined with ShuffleNet and Densely Connected Temporal Convolutional Neural Networks (DC-TCN) called (CFS-DCTCN) to model visual and temporal features, and the integration of contextual understanding mechanisms. The two datasets Wild (LRW) dataset and the GRID Corpus datasets are utilized to validate the proposed model. As the metaverse continues its prominence, integrating Visual Speech Recognition through deep learning represents a pivotal step towards forging immersive and dynamic virtual worlds where communication transcends physical boundaries. This paper contributes to the foundation of technology-driven metaverse development and fosters a future where digital interactions mirror the complexities of human communication. The proposed model achieves 99.5 % on LRW and 98.8 % on the GRID dataset.	en_US
dc.description.sponsorship	Faculty of Science, University of Hradec Kralove, Czech Republic; Yayasan Universiti Teknologi PETRONAS (YUTP) under YUTP Fundemental Research Grant Scheme [015LC0-24]	en_US
dc.description.sponsorship	Venkatachalam Kandasamy was supported by the Faculty of Science, University of Hradec Kralove, Czech Republic. Also, this work is partially funded for Ahmad Sobri Hashim by Yayasan Universiti Teknologi PETRONAS (YUTP) under YUTP Fundemental Research Grant Scheme (015LC0-24) .	en_US
dc.description.woscitationindex	Science Citation Index Expanded
dc.identifier.citationcount	0
dc.identifier.doi	10.1016/j.asoc.2024.111906
dc.identifier.issn	1568-4946
dc.identifier.issn	1872-9681
dc.identifier.scopus	2-s2.0-85198018268
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.1016/j.asoc.2024.111906
dc.identifier.uri	https://hdl.handle.net/20.500.14517/6232
dc.identifier.volume	164	en_US
dc.identifier.wos	WOS:001269620400001
dc.identifier.wosquality	Q1
dc.language.iso	en
dc.publisher	Elsevier	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.scopus.citedbyCount	5
dc.subject	Artificial intelligence	en_US
dc.subject	Metaverse	en_US
dc.subject	Visual speech recognition	en_US
dc.subject	Deep learning	en_US
dc.subject	Avatars	en_US
dc.subject	Virtual communication	en_US
dc.title	AI-based visual speech recognition towards realistic avatars and lip-reading applications in the metaverse	en_US
dc.type	Article	en_US
dc.wos.citedbyCount	4
gdc.coar.access	metadata only access
gdc.coar.type	text::journal::journal article

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

AI-based visual speech recognition towards realistic avatars and lip-reading applications in the metaverse

Files

Collections