One Model is Not Enough: Ensembles for Isolated Sign Language Recognition

Hrúz, Marek; Gruber, Ivan; Kanis, Jakub; Boháček, Matyáš; Hlaváč, Miroslav; Krňoul, Zdeněk

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Hrúz, Marek
dc.contributor.author	Gruber, Ivan
dc.contributor.author	Kanis, Jakub
dc.contributor.author	Boháček, Matyáš
dc.contributor.author	Hlaváč, Miroslav
dc.contributor.author	Krňoul, Zdeněk
dc.date.accessioned	2023-03-06T11:00:26Z	-
dc.date.available	2023-03-06T11:00:26Z	-
dc.date.issued	2022
dc.identifier.citation	HRÚZ, M. GRUBER, I. KANIS, J. BOHÁČEK, M. HLAVÁČ, M. KRŇOUL, Z. One Model is Not Enough: Ensembles for Isolated Sign Language Recognition. SENSORS, 2022, roč. 22, č. 13, s. nestránkováno. ISSN: 1424-8220	cs
dc.identifier.issn	1424-8220
dc.identifier.uri	2-s2.0-85133217387
dc.identifier.uri	http://hdl.handle.net/11025/51652
dc.format	17 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	MDPI	en
dc.relation.ispartofseries	SENSORS	en
dc.rights	© authors	en
dc.title	One Model is Not Enough: Ensembles for Isolated Sign Language Recognition	en
dc.type	článek	cs
dc.type	article	en
dc.rights.access	openAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D and TimeSformer, and one pose-based approach, SPOTER. The appearance-based approaches are trained on a few different data modalities, whereas the performance of SPOTER is evaluated on different types of preprocessing. All the methods are tested on two publicly available datasets: AUTSL and WLASL300. We experiment with ensemble techniques to achieve new state-of-the-art results of 73.84% accuracy on the WLASL300 dataset by using the CMA-ES optimization method to find the best ensemble weight parameters. Furthermore, we present an ensembling technique based on the Transformer model, which we call Neural Ensembler.	en
dc.subject.translated	sign language recognition	en
dc.subject.translated	CNN	en
dc.subject.translated	Transformer	en
dc.subject.translated	ensemble	en
dc.identifier.doi	10.3390/s22135043
dc.type.status	Peer-reviewed	en
dc.identifier.document-number	824167200001
dc.identifier.obd	43937108
dc.project.ID	TN01000024/Národní centrum kompetence - Kybernetika a umělá inteligence	cs
dc.project.ID	90042/Velká výzkumná infrastruktura povinnost (J) - CESNET II	cs
dc.project.ID	EF15_003/0000466/Umělá inteligence a uvažování	cs
Vyskytuje se v kolekcích:	Články / Articles (NTIS) Články / Articles (KKY) OBD

Soubory připojené k záznamu:

Soubor	Velikost	Formát
sensors-22-05043-v3-2.pdf	1,12 MB	Adobe PDF	Zobrazit/otevřít

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/51652

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace