Definition:
This nonheirarchial method initially takes the number of components of the population equal to the final required number of clusters. In this step itself the final required number of clusters is chosen such that the points are mutually farthest apart. Next, it examines each component in the population and assigns it to one of the clusters depending on the minimum distance. The centroid's position is recalculated everytime a component is added to the cluster and this continues until all the components are grouped into the final required number of clusters.
Example:
Food item # | Protein content, P | Fat content, F |
---|---|---|
Food item #1 | 1.1 | 60 |
Food item #2 | 8.2 | 20 |
Food item #3 | 4.2 | 35 |
Food item #4 | 1.5 | 21 |
Food item #5 | 7.6 | 15 |
Food item #6 | 2.0 | 55 |
Food item #7 | 3.9 | 39 |
Let us plot these points so that we can have better understanding of the problem. Also, we can select the three points which are farthest apart.
We see from the graph that the distance between the points 1 and 2, 1 and 3, 1 and 4, 1 and 5, 2 and 3, 2 and 4, 3 and 4 is maximum.
Thus, the four clusters chosen are :
Cluster number | Protein content, P | Fat content, F |
---|---|---|
C1 | 1.1 | 60 |
C2 | 8.2 | 20 |
C3 | 4.2 | 35 |
C4 | 1.5 | 21 |
Also, we observe that point 1 is close to point 6. So, both can be taken as one cluster. The resulting cluster is called C16 cluster. The value of P for C16 centroid is (1.1 + 2.0)/2 = 1.55 and F for C16 centroid is (60 + 55)/2 = 57.50.
Upon closer observation, the point 2 can be merged with the C5 cluster. The resulting cluster is called C25 cluster. The values of P for C25 centroid is (8.2 + 7.6)/2 = 7.9 and F for C25 centroid is (20 + 15)/2 = 17.50
The point 3 is close to point 7. They can be merged into C37 cluster. The values of P for C37 centroid is (4.2 + 3.9)/2 = 4.05 and F for C37 centroid is (35 + 39)/2 = 37.
The point 4 is not close to any point. So, it is assigned to cluster number 4 i.e., C4 with the value of P for C4 centroid as 1.5 and F for C4 centroid is 21.
Cluster number | Protein content, P | Fat content, F |
---|---|---|
C16 | 1.55 | 57.50 |
C25 | 7.9 | 17.5 |
C37 | 4.05 | 37 |
C4 | 1.5 | 21 |
In the above example it was quite easy to estimate the distance between the points. In cases in which it is more difficult to estimate the distance, one has to use euclidean metric to measure the distance between two points to assign a point to a cluster.