Title: Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output
Authors: Švec, Jan
Lehečka, Jan
Šmídl, Luboš
Ircing, Pavel
Citation: ŠVEC, J. LEHEČKA, J. ŠMÍDL, L. IRCING, P. Transformer-Based Automatic Punctuation Prediction and Word Casing Reconstruction of the ASR Output. In Text, Speech, and Dialogue 24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedings. Cham: Springer International Publishing, 2021. s. 86-94. ISBN: 978-3-030-83526-2 , ISSN: 0302-9743
Issue Date: 2021
Publisher: Springer International Publishing
Document type: konferenční příspěvek
URI: 2-s2.0-85115216462
ISBN: 978-3-030-83526-2
ISSN: 0302-9743
Keywords in different language: ASR;BERT;T5;Punctuation predictor;Word casing reconstruction
Abstract in different language: The paper proposes a module for automatic punctuation prediction and casing reconstruction based on transformers architectures (BERT/T5) that constitutes the current state-of-the-art in many similar NLP tasks. The main motivation for our work was to increase the readability of the ASR output. The ASR output is usually in the form of a continuous stream of text, without punctuation marks and with all words in lowercase. The resulting punctuation and casing reconstruction module is evaluated on both the written text and the actual ASR output in three languages (English, Czech and Slovak).
Rights: Plný text je přístupný v rámci univerzity přihlášeným uživatelům.
© Springer
Appears in Collections:Konferenční příspěvky / Conference Papers (KKY)

Files in This Item:
File SizeFormat 
Svec_Transformer-BasedAutomatic_TSD2021.pdf10,29 MBAdobe PDFView/Open    Request a copy

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/47244

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

  1. DSpace at University of West Bohemia
  2. Publikační činnost / Publications
  3. OBD