On Comparison of Phonetic Representations for Czech Neural Speech Synthesis

Matoušek, Jindřich; Tihelka, Daniel

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Matoušek, Jindřich
dc.contributor.author	Tihelka, Daniel
dc.date.accessioned	2023-01-16T11:00:16Z	-
dc.date.available	2023-01-16T11:00:16Z	-
dc.date.issued	2022
dc.identifier.citation	MATOUŠEK, J. TIHELKA, D. On Comparison of Phonetic Representations for Czech Neural Speech Synthesis. In Text, Speech, and Dialogue 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings. Cham: Springer International Publishing, 2022. s. 410-422. ISBN: 978-3-031-16269-5 , ISSN: 0302-9743	cs
dc.identifier.isbn	978-3-031-16269-5
dc.identifier.issn	0302-9743
dc.identifier.uri	2-s2.0-85139064069
dc.identifier.uri	http://hdl.handle.net/11025/50927
dc.format	13 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	Springer International Publishing	en
dc.relation.ispartofseries	Text, Speech, and Dialogue 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings	en
dc.rights	Plný text je přístupný v rámci univerzity přihlášeným uživatelům.	cs
dc.rights	© Springer Nature Switzerland AG	en
dc.title	On Comparison of Phonetic Representations for Czech Neural Speech Synthesis	en
dc.type	konferenční příspěvek	cs
dc.type	ConferenceObject	en
dc.rights.access	restrictedAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	In this paper, we investigate two research questions related to the phonetic representation of input text in Czech neural speech synthesis: 1) whether we can afford to reduce the phonetic alphabet, and 2) whether we can remove pauses from phonetic transcription and let the speech synthesis model predict the pause positions itself. In our experiments, three different modern speech synthesis models (FastSpeech 2 + Multi-band MelGAN, Glow-TTS + UnivNet, and VITS) were employed. We have found that the reduced phonetic alphabet outperforms the traditionally used full phonetic alphabet. On the other hand, removing pauses does not help. The presence of pauses (predicted by an external pause prediction tool) in phonetic transcription leads to a slightly better quality of synthetic speech.	en
dc.subject.translated	neural speech synthesis	en
dc.subject.translated	phonetic representation	en
dc.subject.translated	phonetic reductions	en
dc.subject.translated	pause modeling	en
dc.subject.translated	czech language	en
dc.identifier.doi	10.1007/978-3-031-16270-1_34
dc.type.status	Peer-reviewed	en
dc.identifier.obd	43936699
dc.project.ID	90140/Velká výzkumná infrastruktura_(J) - e-INFRA CZ	cs
dc.project.ID	TL05000546/Využití multimediálního výkladového slovníku pro moderní výuku češtiny	cs
Vyskytuje se v kolekcích:	Konferenční příspěvky / Conference papers (NTIS) Konferenční příspěvky / Conference Papers (KKY) OBD

Soubory připojené k záznamu:

Soubor	Velikost	Formát
Matousek_Tihelka-On_Compariso_of_Phonetic_Representations_TSD_2022.pdf	271,71 kB	Adobe PDF	Zobrazit/otevřít Vyžádat kopii

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/50927

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace