We will use the Numbers data set. The data set contains images of handwritten digits. Recognizinghandwritten digits is already a mature technology. The task of this project is to extract features andcluster the images into homogeneous groups. These groups do not necessarily have to be groups of thesame digit, but can also group the data by the way a digit is written. For each digit you have 28x28pixels with 256 gray values (8 bit). The data and some code to get you started can be found on thecourse web site under data for projects.
Follow the CRISP-DM framework
1. Data Preparation
• Describe several ways you could reprocess the data and extract features. Describe why these steps might be helpful.
• Construct at least 3 additional features (more is better!).
• Perform cluster analysis using several methods (at least k-means and hierarchical clustering) fordifferent features.
• How did you determine a suitable number of clusters for each method?
• Use internal validation measures to describe and compare the clusterings and the clusters (somevisual methods would be good).
• Use external validation measures to describe the clusterings and the clusters. You can find theactual digits in the images in the file number_labels.csv.
prepare the code of each method.