Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Programming Exercise: K-means Clustering and Principal Component Analysis

Introduction

In this exercise, you will implement the K-means clustering algorithm and apply it to compress an image. In the second part, you will use principal component analysis to find a low-dimensional representation of face images. Before starting on the programming exercise, we strongly recommend watch- ing the video lectures and completing the review questions for the associated topics.

To get started with the exercise, you will need to download the starter code and unzip its contents to the directory where you wish to complete the exercise. If needed, use the cd command in Octave/MATLAB to change to this directory before starting this exercise.
You can also find instructions for installing Octave/MATLAB in the "En- vironment Setup Instructions" of the course website.

1 K-means Clustering

In this this exercise, you will implement the K-means algorithm and use it for image compression. You will first start on an example 2D dataset that will help you gain an intuition of how the K-means algorithm works. After that, you wil use the K-means algorithm for image compression by reducing the number of colors that occur in an image to only those that are most common in that image. You will be using ex7.m for this part of the exercise.

You will implement the two phases of the K-means algorithm separately in the next sections.

Finding closest centroids
Your task is to complete the code in findClosestCentroids.m. This function takes the data matrix X and the locations of all centroids inside centroids and should output a one-dimensional array idx that holds the index (a value in 1, ..., K , where K is total number of centroids) of the closest centroid to every training example.

You can implement this using a loop over every training example and every centroid.

Once you have completed the code in findClosestCentroids.m, the script ex7.m will run your code and you should see the output [1 3 2] corresponding to the centroid assignments for the first 3 examples.

Computing centroid means

You should now complete the code in computeCentroids.m. You can implement this function using a loop over the centroids. You can also use a loop over the examples; but if you can use a vectorized implementation that does not use such a loop, your code may run faster.

After you have completed the two functions (findClosestCentroids and computeCentroids), the next step in ex7.m will run the K-means algorithm on a toy 2D dataset to help you understand how K-means works. Your functions are called from inside the runKmeans.m script. We encourage you to take a look at the function to understand how it works. Notice that the code calls the two functions you implemented in a loop.

When you run the next step, the K-means code will produce a visualiza- tion that steps you through the progress of the algorithm at each iteration. Press enter multiple times to see how each step of the K-means algorithm changes the centroids and cluster assignments.

Random initialization
The initial assignments of centroids for the example dataset in ex7.m were designed so that you will see the same figure as in Figure1. In practice, a good strategy for initializing the centroids is to select random examples from the training set.

In this exercise, you will apply K-means to image compression. In a straightforward 24-bit color representation of an image,2 each pixel is repre- sented as three 8-bit unsigned integers (ranging from 0 to 255) that specify the red, green and blue intensity values. This encoding is often refered to as the RGB encoding. Our image contains thousands of colors, and in this part of the exercise, you will reduce the number of colors to 16 colors.

By making this reduction, it is possible to represent (compress) the photo in an efficient way. Specifically, you only need to store the RGB values of the 16 selected colors, and for each pixel in the image you now need to only store the index of the color at that location (where only 4 bits are necessary to represent 16 possibilities).

In this exercise, you will use the K-means algorithm to select the 16 colors that will be used to represent the compressed image. Concretely, you will treat every pixel in the original image as a data example and use the K-means algorithm to find the 16 colors that best group (cluster) the pixels in the 3- dimensional RGB space. Once you have computed the cluster centroids on the image, you will then use the 16 colors to replace the pixels in the original image.

Optional (ungraded) exercise: Use your own image
In this exercise, modify the code we have supplied to run on one of your own images. Note that if your image is very large, then K-means can take a long time to run. Therefore, we recommend that you resize your images to managable sizes before running the code. You can also try to vary K to see the effects on the compression.

2 Principal Component Analysis
In this exercise, you will use principal component analysis (PCA) to perform dimensionality reduction. You will first experiment with an example 2D dataset to get intuition on how PCA works, and then use it on a bigger dataset of 5000 face image dataset.

Implementing PCA
In this part of the exercise, you will implement PCA. PCA consists of two computational steps: First, you compute the covariance matrix of the data.

Then, you use Octave/MATLAB's SVD function to compute the eigenvec- tors U1, U2, . . . , Un. These will correspond to the principal components of variation in the data.
Before using PCA, it is important to first normalize the data by subtract- ing the mean value of each feature from the dataset, and scaling each dimen- sion so that they are in the same range. In the provided script ex7 pca.m, this normalization has been performed for you using the featureNormalize function.

Face Image Dataset

In this part of the exercise, you will run PCA on face images to see how it can be used in practice for dimension reduction. The dataset ex7faces.mat contains a dataset3 X of face images, each 32 32 in grayscale. Each row of X corresponds to one face image (a row vector of length 1024). The next step in ex7 pca.m will load and visualize the first 100 of these face images (Figure7).

PCA on Faces
To run PCA on the face dataset, we first normalize the dataset by subtracting the mean of each feature from the data matrix X. The script ex7 pca.m will do this for you and then run your PCA code. After running PCA, you will obtain the principal components of the dataset. Notice that each principal component in U (each row) is a vector of length n (where for the face dataset, n = 1024). It turns out that we can visualize these principal components by reshaping each of them into a 32 32 matrix that corresponds to the pixels in the original dataset. The script ex7 pca.m displays the first 36 principal components that describe the largest variations (Figure8). If you want, you can also change the code to display more principal components to see how they capture more and more details.

Dimensionality Reduction
Now that you have computed the principal components for the face dataset, you can use it to reduce the dimension of the face dataset. This allows you to use your learning algorithm with a smaller input size (e.g., 100 dimensions) instead of the original 1024 dimensions. This can help speed up your learning algorithm.

The next part in ex7 pca.m will project the face dataset onto only the first 100 principal components. Concretely, each face image is now described by a vector z(i) R100.

In the earlier K-means image compression exercise, you used the K-means algorithm in the 3-dimensional RGB space. In the last part of the ex7 pca.m script, we have provided code to visualize the final pixel assignments in this 3D space using the scatter3 function. Each data point is colored according to the cluster it has been assigned to. You can drag your mouse on the figure to rotate and inspect this data in 3 dimensions.

Attachment:- implement the K-means clustering algorithm.rar

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92738935
  • Price:- $50

Priced at Now at $50, Verified Solution

Have any Question?


Related Questions in Computer Engineering

Question suppose you wish to make a design with 10 lobes

Question : Suppose you wish to make a design with 10 lobes, and you have only a large ring with 120 inner teeth. What is the largest moving wheel you could use?

Prior data indicates if a planter machine is operating

Prior data indicates if a planter machine is operating "properly," the length of the planter produced by the machine can be modeled as being normally distributed with a mean of 107 centimeters and a standard deviation of ...

Configuring name resolutionas the administrator of the

Configuring Name Resolution As the administrator of the Contoso Corporation, you manage an internal website that is often used by most users with the company. You try to connect to an internal website but cannot open the ...

Task allocation and deallocation of mono-dimensional and

Task Allocation and deallocation of mono-dimensional and bi-dimensional arrays represented by pointers a. Declare and implement a function CreateArray(...) that returns a pointer to an array of n integers. b. Declare and ...

Question erd assignmentcreate a crows foot erd following

Question: ERD Assignment Create a Crow's foot ERD following the business rules for the ProdCo Company. 1. Each sales representative writes many invoices 2. Each invoice is written by one sales representative 3. Each sale ...

Understanding the digital revolution assignment - parchment

Understanding the Digital Revolution Assignment - Parchment Purgatory Overview - For this assignment, you will use skills acquired through practical laboratory exercises to automate a business process, and to visualize t ...

Suppose you are writing software for a radio station that

Suppose you are writing software for a radio station that manages its playlists. The program will generate candidate playlists from a record library automatically and station personnel can then check and modifythem. Disc ...

Assignmentnbspon information systems audit and

Assignment  on Information Systems audit and controls Assignment purpose: Elaborate on the different types of control that are applied in a hospital (Preventive, detective and corrective control). Evaluate the logical an ...

Suppose you have an object made up of two w-bit integers x

Suppose you have an object made up of two w-bit integers, x and y. Suppose that the hash code for your object is defined by some deterministic function h(x; y) that produces a single w-bit integer. Prove that there exist ...

Question recently it has become obvious that people are

Question: Recently it has become obvious that people are storing data where it does not belong and people are accessing files and directories that they should not have access to. Elizabeth Montgomery is in charge of the ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As