Název: | Shlukovací metody v data miningu |
Další názvy: | Data mining with clustering |
Autoři: | Klímek, Petr |
Citace zdrojového dokumentu: | E+M. Ekonomie a Management = Economics and Management. 2008, č. 2, s. 120-126. |
Datum vydání: | 2008 |
Nakladatel: | Technická univerzita v Liberci |
Typ dokumentu: | článek article |
URI: | http://www.ekonomie-management.cz/download/1331826675_2e7a/11_klimek.pdf http://hdl.handle.net/11025/17234 |
ISSN: | 1212-3609 (Print) 2336-5604 (Online) |
Klíčová slova: | data mining;clustering;metoda nejbližšího souseda;dendrogram |
Klíčová slova v dalším jazyce: | data mining;clustering;nearest neighbour method;dendrogram |
Abstrakt v dalším jazyce: | Data mining is a new discipline lying at the interface of statistics, database technology, pattern recognition, machine learning, and other areas. It is concerned with the secondary analysis of lar- ge databases in order to find previously unsuspected relationships which are of interest or value to the database owners. There are two keys to success in data mining. First is coming up with a precise formulation of the problem you are trying to solve. A focused statement usually results in the best payoff. The second key is using the right data. After choosing from the data available to you, or perhaps buying external data, you may need to transform and combine it in significant ways. New problems arise, partly as a consequence of the sheer size of the data sets involved, and partly because of issues of pattern matching. H owever, since statistics provides the intellectual glue underlying the effort, it is important for statisticians to become involved. There are very real opportunities for statisticians to make significant contributions. The main definition of data mining and the special data mining tasks are mentioned in the first part of this paper. The data mining problem was also discussed in previous issues of E+M. One method (clustering) was chosen to be a subject of this article. One of the opportunities to gain knowledge from data is a use of clustering analysis. Clustering analysis belongs to unsupervised methods of data mining. We put here a focus on this method. Some basic principles are described in the second part of this paper. This method is examined on two examples from the marketing field. In the first example is used software Statgraphics 5.0Plus (www.statgraphics.com) to solve clustering problem (nearest neighbour algorithm and Eucleidi- an distance), and in the second example is used Statistica 6.0Cz software (from Statoft, Inc., www.statsoft.com or www.statsoft.cz). But the building models is only one step in knowledge discovery. It is vital to properly collect and prepare the data, and to check the models against the real world. The „best“ model is often found after building models of several different types, or by trying different technologies or algorithms. |
Práva: | © Technická univerzita v Liberci CC BY-NC 4.0 |
Vyskytuje se v kolekcích: | Číslo 2 (2008) Číslo 2 (2008) |
Soubory připojené k záznamu:
Soubor | Popis | Velikost | Formát | |
---|---|---|---|---|
11_klimek.pdf | Plný text | 124,46 kB | Adobe PDF | Zobrazit/otevřít |
Použijte tento identifikátor k citaci nebo jako odkaz na tento záznam:
http://hdl.handle.net/11025/17234
Všechny záznamy v DSpace jsou chráněny autorskými právy, všechna práva vyhrazena.