Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations

Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

Title:	Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations
Other Titles:	Automatická statistická evaluace kvality syntézy řeči výběrem jednotek s různými prozodickými manipulacemi
Authors:	Přibil, Jiří Přibilová, Anna Matoušek, Jindřich
Citation:	PŘIBIL, J., PŘIBILOVÁ, A., MATOUŠEK, J. Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations. Journal of Electrical engineering, 2020, roč. 71, č. 2, s. 78-86. ISSN 1335-3632.
Issue Date:	2020
Publisher:	De Gruyter
Document type:	článek article
URI:	2-s2.0-85085749611 http://hdl.handle.net/11025/42609
ISSN:	1335-3632
Keywords:	poslechový test;objektivní a subjektivní hodnocení;kvalita syntetické řeči;statistická analýza
Keywords in different language:	listening test;objective and subjective evaluation;quality of synthetic speech;statistical analysis
Abstract:	Kvalita syntézy řeči je zásadním problémem při porovnávání různých systémů převodu textu na řeč (TTS). Navrhli jsme systém pro automatické hodnocení kvality řeči pomocí statistické analýzy časových příznaků (doba trvání, frázování a časové členění analyzované věty) spolu se standardními spektrálními a prozodickými příznaky. Tento systém byl úspěšně testován na větách produkovaných syntetizátorem řeči založeném na principu výběru jednotek s mužským i ženským hlasem s využitím dvou různých přístupy k manipulaci prozodie. Experimenty ukázaly, že pro správné a stabilní výsledky jsou všechny tři typy řečových příznaků (spektrální, prozodické a časové) nezbytné. Počet použitých statistických parametrů má navíc významný dopad na správnost a přesnost hodnocených výsledků. Bylo také prokázáno, že stabilitu celého procesu hodnocení lze vylepšit rozšířením použitého řečového materiálu. Funkčnost navrhovaného systému byla nakonec ověřena porovnáním s výsledky standardního poslechového testu.
Abstract in different language:	Quality of speech synthesis is a crucial issue in comparison of various text-to-speech (TTS) systems. We proposed a system for automatic evaluation of speech quality by statistical analysis of temporal features (time duration, phrasing, and time structuring of an analysed sentence) together with standard spectral and prosodic features. This system was successfully tested on sentences produced by a unit selection speech synthesizer with a male as well as a female voice using two different approaches to prosody manipulation. Experiments have shown that for correct, sharp, and stable results all three types of speech features (spectral, prosodic, and temporal) are necessary. Furthermore, the number of used statistical parameters has a significant impact on the correctness and precision of the evaluated results. It was also demonstrated that the stability of the whole evaluation process is improved by enlarging the used speech material. Finally, the functionality of the proposed system was verified by comparison of the results with those of the standard listening test.
Rights:	© De Gruyter
Appears in Collections:	Články / Articles (KKY) OBD

Files in This Item:

File	Size	Format
[1339309X - Journal of Electrical Engineering] Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations.pdf	296,26 kB	Adobe PDF	View/Open

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/42609

search

navigation