Identification and classification of DICOM files with burned-in text content

Včelák, Petr; Kryl, Martin; Kratochvíl, Michal; Klečková, Jana

Full metadata record

DC pole	Hodnota	Jazyk
dc.contributor.author	Včelák, Petr
dc.contributor.author	Kryl, Martin
dc.contributor.author	Kratochvíl, Michal
dc.contributor.author	Klečková, Jana
dc.date.accessioned	2019-11-11T11:00:21Z	-
dc.date.available	2019-11-11T11:00:21Z	-
dc.date.issued	2019
dc.identifier.citation	BRYCHCÍN, T., TAYLOR, S., SVOBODA, L. Cross-lingual word analogies using linear transformations between semantic spaces. Expert Systems with Applications, 2019, roč. 135, č. NOV 30 2019, s. 287-295. ISSN 0957-4174.	en
dc.identifier.issn	1386-5056
dc.identifier.uri	2-s2.0-85067242443
dc.identifier.uri	http://hdl.handle.net/11025/35854
dc.description.abstract	Pozadí: Chráněné osobní a zdravotní informace vypálené v pixelech snímku ve formátu DICOM nejsou z různých důvodů indikovány. To komplikuje sekundární použití takových dat. V posledních letech došlo k několika pokusům o anonymizaci nebo de-identifikaci souborů DICOM. Stávající přístupy mají různá omezení. Neexistuje žádné zcela spolehlivé řešení. Zejména u velkých datových souborů je nutné rychle analyzovat a identifikovat soubory, které potenciálně narušují soukromí. Metody: Klasifikace je založena na adaptivně-iterativním algoritmu navrženém k identifikaci jedné ze tří tříd. Existuje několik transformací obrazu, optické rozpoznávání znaků a filtry; pak je učiněno lokální rozhodnutí. Potvrzené lokální rozhodnutí je konečné. Klasifikátor byl trénován na datovém souboru složeném z 15 334 snímků různých modalit. Výsledky: Falešně pozitivní hodnoty jsou ve všech případech pod 4,00 % a 1,81 % v případě kritického případu detekce chráněných osobních a zdravotních informací. Vážená průměrná citlivost klasifikátoru byla 94,85 %, vážená průměrná inverzní citlivost dosahovala 97,42 % a Cohenův Kappa koeficient byl 0,920. Závěr: Navrhovaný přístup pro klasifikaci textu vypáleného ve snímcích je vysoce konfigurovatelný a schopný analyzovat obrázky z různých modalit se šumem na pozadí. Řešení bylo validováno a jeho cílem je identifikovat DICOM soubory, pro které musí být omezený přístup nebo musí být důkladně de-identifikovány kvůli problémům s výskytem osobních údajů. Na rozdíl od stávajících nástrojů lze rozpoznaný text, včetně jeho souřadnic, dále použít pro de-identifikaci.	cs
dc.format	9 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	Elsevier	en
dc.relation.ispartofseries	Expert Systems with Applications	en
dc.rights	Plný text je přístupný v rámci univerzity přihlášeným uživatelům.	cs
dc.rights	© Elsevier	en
dc.subject	Vypálené chráněné osobní a zdravotní údaje	cs
dc.subject	klasifikace	cs
dc.subject	de-identifikace	cs
dc.subject	DICOM	cs
dc.subject	HIPAA	cs
dc.subject	detekce textu	cs
dc.title	Identification and classification of DICOM files with burned-in text content	en
dc.title.alternative	Identifikace a klasifikace DICOM souborů s vypáleným textem ve snímku	cs
dc.type	článek	cs
dc.type	article	en
dc.rights.access	restrictedAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	The ability to represent the meaning of words is one of the core parts of natural language understanding (NLU), with applications ranging across machine translation, summarization, question answering, information retrieval, etc. The need for reasoning in multilingual contexts and transferring knowledge in cross- lingual systems has given rise to cross-lingual semantic spaces, which learn representations of words across different languages. With growing attention to cross-lingual representations, it has became crucial to investigate proper evaluation schemes. The word-analogy-based evaluation has been one of the most common tools to evaluate linguistic relationships (such as male-female relationships or verb tenses) encoded in monolingual meaning representations. In this paper, we go beyond monolingual representations and generalize the word analogy task across languages to provide a new intrinsic evaluation tool for cross-lingual semantic spaces. Our approach allows examining cross-lingual projections and their impact on different aspects of meaning. It helps to discover potential weaknesses or advantages of cross-lingual methods before they are incorporated into different intelligent systems. We experiment with six languages within different language families, including English, German, Spanish, Italian, Czech, and Croatian. State-of-the-art monolingual semantic spaces are transformed into a shared space using dictionaries of word translations. We compare several linear transformations and rank them for experiments with monolingual (no transformation), bilingual (one semantic space is transformed to another), and multilingual (all semantic spaces are transformed onto English space) versions of semantic spaces. We show that tested linear transformations preserve relationships between words (word analogies) and lead to impressive results. We achieve average accuracy of 51.1%, 43.1%, and 38.2% for monolingual, bilingual, and multilingual semantic spaces, respectively.	en
dc.subject.translated	Burned-in protected health information	en
dc.subject.translated	Classification	en
dc.subject.translated	De-identification	en
dc.subject.translated	DICOM	en
dc.subject.translated	HIPAA	en
dc.subject.translated	Text detection	en
dc.identifier.doi	10.1016/j.ijmedinf.2019.02.011
dc.type.status	Peer-reviewed	en
dc.identifier.document-number	465414600016
dc.identifier.obd	43926834
dc.project.ID	EF17_048/0007267/InteCom: VaV inteligentních komponent pokročilých technologií pro plzeňskou metropolitní oblast	cs
dc.project.ID	SGS-2019-018/Zpracování heterogenních dat a jejich specializované aplikace	cs
Vyskytuje se v kolekcích:	Články / Articles (NTIS) Články / Articles (KIV) OBD

Soubory připojené k záznamu:

Soubor	Velikost	Formát
20190902-vcelak-j-ijmi-201906-article.pdf	6,28 MB	Adobe PDF	Zobrazit/otevřít Vyžádat kopii

Zobrazit minimální záznam Zobrazit statistiky

Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam: http://hdl.handle.net/11025/35854

Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.

hledání

navigace