Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security

Heigl, Michael; Weigelt, Enrico; Fiala, Dalibor; Schramm, Martin

Full metadata record

DC Field	Value	Language
dc.contributor.author	Heigl, Michael
dc.contributor.author	Weigelt, Enrico
dc.contributor.author	Fiala, Dalibor
dc.contributor.author	Schramm, Martin
dc.date.accessioned	2022-02-07T11:00:14Z	-
dc.date.available	2022-02-07T11:00:14Z	-
dc.date.issued	2021
dc.identifier.citation	HEIGL, M. WEIGELT, E. FIALA, D. SCHRAMM, M. Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security. Applied Sciences, 2021, roč. 11, č. 24, s. 1-30. ISSN: 2076-3417	cs
dc.identifier.issn	2076-3417
dc.identifier.uri	2-s2.0-85121349927
dc.identifier.uri	http://hdl.handle.net/11025/46759
dc.description.abstract	V několika posledních letech se metody strojového učení (zvláště ty zabývající se detekcí odlehlých hodnot) v oblasti kyberbezpečnosti opíraly o zjišťování anomálií síťového provozu spočívajících v nových schématech útoků. Avšak všudypřítomnost masivních průběžně generovaných datových toků představuje ohromnou výzvu pro účinná detekční schémata a vyžaduje rychlé paměťově nenáročné online algoritmy schopné se potýkat se změnami konceptu. Volba vlastností hraje důležitou roli při zlepšování detekce odlehlých hodnot identifikací zašuměných dat, která obsahují nerelevantní nebo nadbytečné vlastnosti. Současný výzkum se zaměřuje buď na výběr vlastností bez učitele pro průběžně přicházející data nebo na (offline) detekci odlehlých hodnot. V této práci jsou zformulovány podstatné požadavky na kombinaci obou přístupů a dále jsou porovnány s existujícími řešeními. Obsáhlá rešerše odhalila mezeru ve výběru vlastností bez učitele pro zlepšování již hotových metod detekce odlehlých hodnot v datových tocích. Takže navrhujeme nový algoritmus volby vlastností bez učitele pro detekci odlehlých hodnot v průběžně přicházejících datech označovaný jako UFSSOD, který je schopen takové hodnoty automaticky odhalovat. Navíc umí zjišťovat množství nejlepších vlastností shlukováním jejich vypočítaných hodnot. Následně odvozujeme generický koncept, který ukazuje dva aplikační scénáře UFSSOD ve spojení s již hotovými online algoritmy detekce odlehlých hodnot. Rozsáhlé experimenty ukázaly, že slibný mechanismus volby vlastností pro průběžně přicházející data není v oblasti detekce odlehlých hodnot k dispozici. Nadto UFSSOD coby algoritmus schopný online zpracování vykazuje srovnatelné výsledky jako současná nejlepší offline metoda upravená pro detekci odlehlých hodnot.	cs
dc.format	30 s.	cs
dc.format.mimetype	application/pdf
dc.language.iso	en	en
dc.publisher	MDPI	en
dc.relation.ispartofseries	Applied Sciences	en
dc.rights	© authors	en
dc.subject	výběr vlastností, detekce odlehlých hodnot, detekce vniknutí, síťová bezpečnost, strojové učení, online učení, učení bez učitele, průběžně přicházející data	cs
dc.title	Unsupervised Feature Selection for Outlier Detection on Streaming Data to Enhance Network Security	en
dc.title.alternative	Výběr vlastností bez učitele pro detekci odlehlých hodnot v průběžně přicházejících datech za účelem zvýšení síťové bezpečnosti Výběr vlastností bez učitele pro detekci odlehlých hodnot v průběžně přicházejících datech za účelem zvýšení síťové bezpečnosti	cs
dc.type	článek	cs
dc.type	article	en
dc.rights.access	openAccess	en
dc.type.version	publishedVersion	en
dc.description.abstract-translated	Over the past couple of years, machine learning methods—especially the outlier detection ones—have anchored in the cybersecurity field to detect network-based anomalies rooted in novel attack patterns. However, the ubiquity of massive continuously generated data streams poses an enormous challenge to efficient detection schemes and demands fast, memory-constrained online algorithms that are capable to deal with concept drifts. Feature selection plays an important role when it comes to improve outlier detection in terms of identifying noisy data that contain irrelevant or redundant features. State-of-the-art work either focuses on unsupervised feature selection for data streams or (offline) outlier detection. Substantial requirements to combine both fields are derived and compared with existing approaches. The comprehensive review reveals a research gap in unsupervised feature selection for the improvement of outlier detection methods in data streams. Thus, a novel algorithm for Unsupervised Feature Selection for Streaming Outlier Detection, denoted as UFSSOD, will be proposed, which is able to perform unsupervised feature selection for the purpose of outlier detection on streaming data. Furthermore, it is able to determine the amount of top-performing features by clustering their score values. A generic concept that shows two application scenarios of UFSSOD in conjunction with off-the-shell online outlier detection algorithms has been derived. Extensive experiments have shown that a promising feature selection mechanism for streaming data is not applicable in the field of outlier detection. Moreover, UFSSOD, as an online capable algorithm, yields comparable results to a state-of-the-art offline method trimmed for outlier detection.	en
dc.subject.translated	feature selection	en
dc.subject.translated	outlier detection	en
dc.subject.translated	intrusion detection	en
dc.subject.translated	network security	en
dc.subject.translated	machine learning	en
dc.subject.translated	online learning	en
dc.subject.translated	unsupervised learning	en
dc.subject.translated	streaming data	en
dc.identifier.doi	10.3390/app112412073
dc.type.status	Peer-reviewed	en
dc.identifier.document-number	735828400001
dc.identifier.obd	43934615
Appears in Collections:	Konferenční příspěvky / Conference Papers (KIV) OBD

Files in This Item:

File	Size	Format
Fiala applsci-11-12073.pdf	1,84 MB	Adobe PDF	View/Open

Show simple item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/46759

search

navigation