Title: Semi-Supervised Learning Approach for Fine Grained Human Hand Action Recognition in Industrial Assembly
Authors: Sturm, Fabian
Sathiyababu, Rahul
Hergenroether, Elke
Siegel, Melanie
Citation: WSCG 2023: full papers proceedings: 1. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, p. 340-350.
Issue Date: 2023
Publisher: Václav Skala - UNION Agency
Document type: konferenční příspěvek
conferenceObject
URI: http://hdl.handle.net/11025/54442
ISBN: 978-80-86943-32-9
ISSN: 2464–4617 (print)
2464–4625 (CD/DVD)
Keywords: rozpoznávání lidského jednání;průmyslová montáž;polořízené učení;přenos učení;transformátor
Keywords in different language: human action recognition;industrial assembly;semi-supervised learning;transfer learning;transformer
Abstract in different language: Until now, it has been impossible to imagine industrial manual assembly without humans due to their flexibility and adaptability. But the assembly process does not always benefit from human intervention. The error-proneness of the assembler due to disturbance, distraction or inattention requires intelligent support of the employee and is ideally suited for deep learning approaches because of the permanently occurring and repetitive data patterns. However, there is the problem that the labels of the data are not always sufficiently available. In this work, a spatio-temporal transformer model approach is used to address the circumstances of few labels in an industrial setting. A pseudo-labeling method from the field of semi-supervised transfer learning is applied for model training, and the entire architecture is adapted to the fine-grained recognition of human hand actions in assembly. This implementation significantly improves the generalization of the model during the training process over different variations of strong and weak classes from the ground truth and proves that it is possible to work with deep learning technologies in an industrial setting, even with few labels. In addition to the main goal of improving the generalization capabilities of the model by using less data during training and exploring different variations of appropriate ground truth and new classes, the recognition capabilities of the model are improved by adding convolution to the temporal embedding layer, which increases the test accuracy by over 5% compared to a similar predecessor model.
Rights: © Václav Skala - UNION Agency
Appears in Collections:WSCG 2023: Full Papers Proceedings

Files in This Item:
File Description SizeFormat 
F89-full.pdfPlný text7,54 MBAdobe PDFView/Open


Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/54442

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.