WORLD DATA CLUSTERING
ADEWALE .O . MAKO
DATA MINING
INTRODUCTION:
Data mining is the analysis step of knowledge discovery in databases or a field at the intersection of computer science and statistics. It is also the analysis of large observational datasets to find unsuspected relationships. This definition refers to observational data as opposed to experimental data.
Data mining typically deals with data that has already been collected for some purpose or the other than the data mining analysis. It is often referred to as ‘secondary data analysis.
The overall goal of the data mining process is to extract information from a dataset and transform it into an understandable structure for further use.
SCORE FUNCTIONS IN DATA MINING
A score function is a measure of one’s performance while making decisions under uncertainty. The purpose of a score function in data mining is to rank models as a function of how useful the models are to the data miner. A chosen score function should reflect the overall goals of the data mining task as far as is possible. Different score functions have different properties and are useful in different situations which is why one should avoid using a convenient score function because it will most likely be inappropriate for the task at hand.
CLUSTERING
Clustering is one of the most important unsupervised learning techniques. It deals with finding a structure in a collection of unlabelled data as every other problem of this kind.
Clustering is in the eye of the beholder.in the other word there is not accurately correct clustering algorithm.
We can describe clustering as a process of organizing objects into groups that members have some similarity in particular way.in the other word a cluster is therefore a collection of objects that are similarity between them and are dissimilarity to the objects belonging to other cluster. An advantage of clustering is
References: Allan, T. 2012, Evaluating Clusters CS3002, Artificial Intelligence, Brunel University Chen, C (1999). Information Visualisation and Virtual Environments. . Kent: Gray Publishing. P61-82. Hand et al. (2001). Score Functions for Data Mining Algorithms. In:,. Principles of Data Mining. 4th ed. Massachusetts: MIT Press. NIL. Lloyd, S. . (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory. 7 (28), p129–137. Witten, IH. Frank E (2005). Data Mining: Practical Learning Tools and Techniques.. San Francisco: Morgan Kaufmann Publishers. NIL.