Title: | One Model is Not Enough: Ensembles for Isolated Sign Language Recognition |
Authors: | Hrúz, Marek Gruber, Ivan Kanis, Jakub Boháček, Matyáš Hlaváč, Miroslav Krňoul, Zdeněk |
Citation: | HRÚZ, M. GRUBER, I. KANIS, J. BOHÁČEK, M. HLAVÁČ, M. KRŇOUL, Z. One Model is Not Enough: Ensembles for Isolated Sign Language Recognition. SENSORS, 2022, roč. 22, č. 13, s. nestránkováno. ISSN: 1424-8220 |
Issue Date: | 2022 |
Publisher: | MDPI |
Document type: | článek article |
URI: | 2-s2.0-85133217387 http://hdl.handle.net/11025/51652 |
ISSN: | 1424-8220 |
Keywords in different language: | sign language recognition;CNN;Transformer;ensemble |
Abstract in different language: | In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D and TimeSformer, and one pose-based approach, SPOTER. The appearance-based approaches are trained on a few different data modalities, whereas the performance of SPOTER is evaluated on different types of preprocessing. All the methods are tested on two publicly available datasets: AUTSL and WLASL300. We experiment with ensemble techniques to achieve new state-of-the-art results of 73.84% accuracy on the WLASL300 dataset by using the CMA-ES optimization method to find the best ensemble weight parameters. Furthermore, we present an ensembling technique based on the Transformer model, which we call Neural Ensembler. |
Rights: | © authors |
Appears in Collections: | Články / Articles (NTIS) Články / Articles (KKY) OBD |
Files in This Item:
File | Size | Format | |
---|---|---|---|
sensors-22-05043-v3-2.pdf | 1,12 MB | Adobe PDF | View/Open |
Please use this identifier to cite or link to this item:
http://hdl.handle.net/11025/51652
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.