Complexity of the TDNN Acoustic Model with Respect to the HMM Topology

Psutka, Josef; Vaněk, Jan; Pražák, Aleš

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Psutka, Josef
dc.contributor.author	Vaněk, Jan
dc.contributor.author	Pražák, Aleš
dc.date.accessioned	2021-02-22T11:00:20Z	-
dc.date.available	2021-02-22T11:00:20Z	-
dc.date.issued	2020
dc.identifier.citation	PSUTKA, J., VANĚK, J., PRAŽÁK, A. Complexity of the TDNN Acoustic Model with Respect to the HMM Topology. In: Text, Speech, and Dialogue 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedings. Cham: Springer, 2020. s. 465-473. ISBN 978-3-030-58322-4, ISSN 0302-9743.	cs
dc.identifier.isbn	978-3-030-58322-4
dc.identifier.issn	0302-9743
dc.identifier.uri	2-s2.0-85091157003
dc.identifier.uri	http://hdl.handle.net/11025/42718
dc.format	9 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	Springer	en
dc.relation.ispartofseries	Text, Speech, and Dialogue 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedings	en
dc.rights	Plný text není přístupný.	cs
dc.rights	© Springer	en
dc.title	Complexity of the TDNN Acoustic Model with Respect to the HMM Topology	en
dc.type	konferenční příspěvek	cs
dc.type	conferenceObject	en
dc.rights.access	closedAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	In this paper, we discuss some of the properties of training acoustic models using a lattice-free version of the maximum mutual information criterion (LF-MMI). Currently, the LF-MMI method achieves state-of-the-art results on many speech recognition tasks. Some of the key features of the LF-MMI approach are: training DNN without initialization from a cross-entropy system, the use of a 3-fold reduced frame rate and the use of a simpler HMM topology. The conventional 3-state HMM topology was replaced in a typical LF-MMI training procedure with a special 1-stage HMM topology, that has different pdfs on the self-loop and forward transitions. In this paper, we would like to discuss both the different types of HMM topologies (conventional 1-, 2- and 3-state HMM topology) and the advantages of using biphone context modeling over using the original triphone or a simpler monophone context. We would also like to mention the impact of the subsampling factor to WER.	en
dc.subject.translated	Speech recognition, Acoustic modeling, HMM topology, Lattice-free MMI	en
dc.identifier.doi	10.1007/978-3-030-58323-1_50
dc.type.status	Peer-reviewed	en
dc.identifier.obd	43930364
dc.project.ID	LO1506/PUNTIS - Podpora udržitelnosti centra NTIS - Nové technologie pro informační společnost	cs
Vyskytuje se v kolekcích:	Konferenční příspěvky / Conference papers (NTIS) Konferenční příspěvky / Conference Papers (KKY) OBD

Soubory připojené k záznamu:

Soubor	Velikost	Formát
Psutka2020_Chapter_ComplexityOfTheTDNNAcousticMod.pdf	260,45 kB	Adobe PDF	Zobrazit/otevřít Vyžádat kopii

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/42718

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace