Název: Automatic Correction of i/y Spelling in Czech ASR Output
Autoři: Švec, Jan
Lehečka, Jan
Šmídl, Luboš
Ircing, Pavel
Citace zdrojového dokumentu: ŠVEC, J. LEHEČKA, J. ŠMÍDL, L. IRCING, P. Automatic Correction of i/y Spelling in Czech ASR Output. In: Text, Speech, and Dialogue 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedings. Cham: Springer, 2020. s. 321-330. ISBN 978-3-030-58322-4, ISSN 0302-9743.
Datum vydání: 2020
Nakladatel: Springer
Typ dokumentu: konferenční příspěvek
URI: 2-s2.0-85091182120
ISBN: 978-3-030-58322-4
ISSN: 0302-9743
Klíčová slova v dalším jazyce: Grammatical error correction, ASR , BERT
Abstrakt v dalším jazyce: This paper concentrates on the design and evaluation of the method that would be able to automatically correct the spelling of i/y in the Czech words at the output of the ASR decoder. After analysis of both the Czech grammar rules and the data, we have decided to deal only with the endings consisting of consonants b/f/l/m/p/s/v/z followed by i/y in both short and long forms. The correction is framed as the classification task where the word could belong to the “i” class, the “y” class or the “empty” class. Using the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) architecture, we were able to substantially improve the correctness of the i/y spelling both on the simulated and the real ASR output. Since the misspelling of i/y in the Czech texts is seen by the majority of native Czech speakers as a blatant error, the corrected output greatly improves the perceived quality of the ASR system.
Práva: Plný text není přístupný.
© Springer
Vyskytuje se v kolekcích:Konferenční příspěvky / Conference papers (NTIS)
Konferenční příspěvky / Conference Papers (KKY)

Soubory připojené k záznamu:
Soubor VelikostFormát 
Švec2020_Chapter_AutomaticCorrectionOfIYSpellin.pdf251,46 kBAdobe PDFView/Open    Request a copy

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/43118

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

  1. DSpace at University of West Bohemia
  2. Publikační činnost / Publications
  3. OBD