Automatic punctuation annotation in czech broadcast news speech

Kolář, Jáchym; Švec, Jan; Psutka, Josef

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Kolář, Jáchym
dc.contributor.author	Švec, Jan
dc.contributor.author	Psutka, Josef
dc.date.accessioned	2016-01-06T09:07:40Z
dc.date.available	2016-01-06T09:07:40Z
dc.date.issued	2004
dc.identifier.citation	KOLÁŘ, Jáchym; ŠVEC, Jan; PSUTKA, Josef. Automatic punctuation annotation in czech broadcast news speech. In: SPECOM 2004 Proceedings. St. Petersburg: Institute for Informatics and Automation of RAS (SPIIRAS), 2004, p. 319-325. ISBN 5-7452-0110-X.	en
dc.identifier.isbn	5-7452-0110-X
dc.identifier.uri	http://www.kky.zcu.cz/cs/publications/KolarJ_2004_Automaticpunctuation
dc.identifier.uri	http://hdl.handle.net/11025/17116
dc.description.abstract	Tento článek se zabývá našimi počátečními experimenty s automatickou anotací interpunkce v mluvené češtině. Použili jsme 2 statistické modely - prozodický a jazykový. Byly otestovány 2 implementace prozodického modelu - CART a MLP. Pro jazykové modelováni byl použit N-gramový model se skrytými událostmi. Kombinovaný model dosáhl na referenčních přepisech přesnosti 95.2% a F-measure 78.2%.	cs
dc.format	7 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	SPIIRAS	en
dc.rights	© Jáchym Kolář - Jan Švec - Josef Psutka	cs
dc.subject	automatická interpunkce	cs
dc.subject	prozodie	cs
dc.subject	hranice vět	cs
dc.subject	rozhlasové zprávy	cs
dc.subject	morfologické značkování	cs
dc.title	Automatic punctuation annotation in czech broadcast news speech	en
dc.title.alternative	Automatická anotace interpunkce v řečových nahrávkách českých zpráv	cs
dc.type	článek	cs
dc.type	article	en
dc.rights.access	openAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	This paper reports our initial experiments with automatic punctuation annotation from speech. We have focused on Czech broadcast news speech. We employed two statistical models - prosodic model and language model. The prosodic model expresses relationships between prosodic quantities (such as pitch, speaking rate or loudness) and punctuation marks. We tested two implementations of this model -- decision tree and multi-layer perceptron. Hidden-event N-gram models were employed for language modeling. Instead of using an ordinary word-based model, we replaced infrequent word forms by their morphological tags and trained a mixed model. Scores from both models can be combined. The model combining language model with the decision tree yielded superior results. Testing on true words we achieved classification accuracy 95.2% and F-measure 78.2%.	en
dc.subject.translated	automatic punctuation	en
dc.subject.translated	prosody	en
dc.subject.translated	sentence boundary	en
dc.subject.translated	broadcast news	en
dc.subject.translated	tag-based models	en
dc.type.status	Peer-reviewed	en
Vyskytuje se v kolekcích:	Články / Articles (KKY)

Soubory připojené k záznamu:

Soubor	Popis	Velikost	Formát
KolarJ_2004_Automaticpunctuation.pdf	Plný text	94,95 kB	Adobe PDF	Zobrazit/otevřít

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/17116

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace