ALGORITHMS AND ANALYSIS USING SAS
BY: AHMED ALDAHHAN
SUPERVISED BY: LECTURER JING XU
BIRKBECK UNIVERSITY OF LONDON
2013/2014
ABSTRACT
The scope of this paper is to provide an introduction to cluster analysis; by giving a general background for cluster analysis; and explaining the concept of cluster analysis and how the clustering algorithms work. A basic idea and the use of each clustering method will be described with its graphical features. Different clustering techniques are also explained with examples to get a better idea. The two main clustering techniques (Hierarchical and K-means Partitioning) are illustrated using a sample data set ‘IRIS FLOWER DATA
SET’ (1936), where a comparison of the two methods is made based on data suitability and model performance. TABLE OF CONTENTS
CHAPTER 1
1.0
Introduction …………………………………………………………………………………………………….. 5
1.1
Understanding Cluster Analysis ……………..……………………….……………………………….. 7
CHAPTER 2
2.0
Definitions …………………………………………………………………………..………………………..… 9
2.1
The Data Matrix ………………………………………………………..…….…………………………….… 9
2.2
The Proximity matrix ………………………………………………………………….……………………. 9
2.3
Similarity and Dissimilarity Matrices ………..………………..………………………………..…. 11
2.4
Different Types of Clusters ……………………………………………………………………..………. 11
2.4.1
2.4.2
Centre-Based cluster definition ……………………………………………………………… 12
2.4.3
Contiguity-Based Cluster Definition ………………………………………………….……. 13
2.4.4
Density-Based Clusters definition …………………………………………………..……… 13
2.4.5
2.5
Well-Separated cluster definition ………………………………………………………….. 11
Shared-Property ( Conceptual Clusters ) ………………………………………………... 14
Distance Matrix ………………………………………………………………………………………………. 14
2.6 Hierarchical Clustering …………………………………………….………………………………….….… 16
2.6.1
Agglomerative Hierarchical Clustering …………………………………………………... 16
2.6.2
Divisive Hierarchical Clustering
Cited: DEC 13