Title: Structural metadata annotation of speech corpora: comparing broadcast news and broadcast conversations
Authors: Kolář, Jáchym
Švec, Jan
Citation: KOLÁŘ, Jáchym; ŠVEC, Jan. Structural metadata annotation of speech corpora: comparing broadcast news and broadcast conversations. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08): 28-29-30 May 2008. Marrakech: ELRA, 2008, p. [1-6]. ISBN 2-9517408-4-0.
Issue Date: 2008
Publisher: ELRA
Document type: článek
article
URI: http://www.kky.zcu.cz/cs/publications/KolarJ_2008_StructuralMetadata
http://hdl.handle.net/11025/17111
ISBN: 2-9517408-4-0
Keywords: extrakce stukturálních metadat;automatická konverze řeči;řečový korpus
Keywords in different language: structural metadata extraction;automatic conversion of speech;speech corpora
Abstract in different language: Structural metadata extraction (MDE) research aims to develop techniques for automatic conversion of raw speech recognition output to forms that are more useful to humans and to downstream automatic processes. It may be achieved by inserting boundaries of syntactic/ semantic units to the flow of speech, labeling non-content words like filled pauses and discourse markers for optional removal, and identifying sections of disfluent speech. This paper compares two Czech MDE speech corpora – one in the domain of broadcast news and the other in the domain of broadcast conversations. A variety of statistics about fillers, edit disfluencies, and syntactic/semantic units are presented. Among many others, we report the statistics indicating that disfluent portions of speech show differences in the distribution of parts of speech (POS) of their word content in comparison with the overall POS distribution. The two Czech corpora are not only compared with each other, but also with available statistics relating to English MDE corpora of broadcast news and telephone conversations.
Rights: © Jáchym Kolář - Jan Švec
Appears in Collections:Články / Articles (KKY)

Files in This Item:
File Description SizeFormat 
KolarJ_2008_StructuralMetadata.pdfPlný text80,14 kBAdobe PDFView/Open


Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/17111

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.