Ask Computer Engineering Expert

Programming Exercise: K-means Clustering and Principal Component Analysis

Introduction

In this exercise, you will implement the K-means clustering algorithm and apply it to compress an image. In the second part, you will use principal component analysis to find a low-dimensional representation of face images. Before starting on the programming exercise, we strongly recommend watch- ing the video lectures and completing the review questions for the associated topics.

To get started with the exercise, you will need to download the starter code and unzip its contents to the directory where you wish to complete the exercise. If needed, use the cd command in Octave/MATLAB to change to this directory before starting this exercise.
You can also find instructions for installing Octave/MATLAB in the "En- vironment Setup Instructions" of the course website.

1 K-means Clustering

In this this exercise, you will implement the K-means algorithm and use it for image compression. You will first start on an example 2D dataset that will help you gain an intuition of how the K-means algorithm works. After that, you wil use the K-means algorithm for image compression by reducing the number of colors that occur in an image to only those that are most common in that image. You will be using ex7.m for this part of the exercise.

You will implement the two phases of the K-means algorithm separately in the next sections.

Finding closest centroids
Your task is to complete the code in findClosestCentroids.m. This function takes the data matrix X and the locations of all centroids inside centroids and should output a one-dimensional array idx that holds the index (a value in 1, ..., K , where K is total number of centroids) of the closest centroid to every training example.

You can implement this using a loop over every training example and every centroid.

Once you have completed the code in findClosestCentroids.m, the script ex7.m will run your code and you should see the output [1 3 2] corresponding to the centroid assignments for the first 3 examples.

Computing centroid means

You should now complete the code in computeCentroids.m. You can implement this function using a loop over the centroids. You can also use a loop over the examples; but if you can use a vectorized implementation that does not use such a loop, your code may run faster.

After you have completed the two functions (findClosestCentroids and computeCentroids), the next step in ex7.m will run the K-means algorithm on a toy 2D dataset to help you understand how K-means works. Your functions are called from inside the runKmeans.m script. We encourage you to take a look at the function to understand how it works. Notice that the code calls the two functions you implemented in a loop.

When you run the next step, the K-means code will produce a visualiza- tion that steps you through the progress of the algorithm at each iteration. Press enter multiple times to see how each step of the K-means algorithm changes the centroids and cluster assignments.

Random initialization
The initial assignments of centroids for the example dataset in ex7.m were designed so that you will see the same figure as in Figure1. In practice, a good strategy for initializing the centroids is to select random examples from the training set.

In this exercise, you will apply K-means to image compression. In a straightforward 24-bit color representation of an image,2 each pixel is repre- sented as three 8-bit unsigned integers (ranging from 0 to 255) that specify the red, green and blue intensity values. This encoding is often refered to as the RGB encoding. Our image contains thousands of colors, and in this part of the exercise, you will reduce the number of colors to 16 colors.

By making this reduction, it is possible to represent (compress) the photo in an efficient way. Specifically, you only need to store the RGB values of the 16 selected colors, and for each pixel in the image you now need to only store the index of the color at that location (where only 4 bits are necessary to represent 16 possibilities).

In this exercise, you will use the K-means algorithm to select the 16 colors that will be used to represent the compressed image. Concretely, you will treat every pixel in the original image as a data example and use the K-means algorithm to find the 16 colors that best group (cluster) the pixels in the 3- dimensional RGB space. Once you have computed the cluster centroids on the image, you will then use the 16 colors to replace the pixels in the original image.

Optional (ungraded) exercise: Use your own image
In this exercise, modify the code we have supplied to run on one of your own images. Note that if your image is very large, then K-means can take a long time to run. Therefore, we recommend that you resize your images to managable sizes before running the code. You can also try to vary K to see the effects on the compression.

2 Principal Component Analysis
In this exercise, you will use principal component analysis (PCA) to perform dimensionality reduction. You will first experiment with an example 2D dataset to get intuition on how PCA works, and then use it on a bigger dataset of 5000 face image dataset.

Implementing PCA
In this part of the exercise, you will implement PCA. PCA consists of two computational steps: First, you compute the covariance matrix of the data.

Then, you use Octave/MATLAB's SVD function to compute the eigenvec- tors U1, U2, . . . , Un. These will correspond to the principal components of variation in the data.
Before using PCA, it is important to first normalize the data by subtract- ing the mean value of each feature from the dataset, and scaling each dimen- sion so that they are in the same range. In the provided script ex7 pca.m, this normalization has been performed for you using the featureNormalize function.

Face Image Dataset

In this part of the exercise, you will run PCA on face images to see how it can be used in practice for dimension reduction. The dataset ex7faces.mat contains a dataset3 X of face images, each 32 32 in grayscale. Each row of X corresponds to one face image (a row vector of length 1024). The next step in ex7 pca.m will load and visualize the first 100 of these face images (Figure7).

PCA on Faces
To run PCA on the face dataset, we first normalize the dataset by subtracting the mean of each feature from the data matrix X. The script ex7 pca.m will do this for you and then run your PCA code. After running PCA, you will obtain the principal components of the dataset. Notice that each principal component in U (each row) is a vector of length n (where for the face dataset, n = 1024). It turns out that we can visualize these principal components by reshaping each of them into a 32 32 matrix that corresponds to the pixels in the original dataset. The script ex7 pca.m displays the first 36 principal components that describe the largest variations (Figure8). If you want, you can also change the code to display more principal components to see how they capture more and more details.

Dimensionality Reduction
Now that you have computed the principal components for the face dataset, you can use it to reduce the dimension of the face dataset. This allows you to use your learning algorithm with a smaller input size (e.g., 100 dimensions) instead of the original 1024 dimensions. This can help speed up your learning algorithm.

The next part in ex7 pca.m will project the face dataset onto only the first 100 principal components. Concretely, each face image is now described by a vector z(i) R100.

In the earlier K-means image compression exercise, you used the K-means algorithm in the 3-dimensional RGB space. In the last part of the ex7 pca.m script, we have provided code to visualize the final pixel assignments in this 3D space using the scatter3 function. Each data point is colored according to the cluster it has been assigned to. You can drag your mouse on the figure to rotate and inspect this data in 3 dimensions.

Attachment:- implement the K-means clustering algorithm.rar

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92738935
  • Price:- $50

Priced at Now at $50, Verified Solution

Have any Question?


Related Questions in Computer Engineering

Does bmw have a guided missile corporate culture and

Does BMW have a guided missile corporate culture, and incubator corporate culture, a family corporate culture, or an Eiffel tower corporate culture?

Rebecca borrows 10000 at 18 compounded annually she pays

Rebecca borrows $10,000 at 18% compounded annually. She pays off the loan over a 5-year period with annual payments, starting at year 1. Each successive payment is $700 greater than the previous payment. (a) How much was ...

Jeff decides to start saving some money from this upcoming

Jeff decides to start saving some money from this upcoming month onwards. He decides to save only $500 at first, but each month he will increase the amount invested by $100. He will do it for 60 months (including the fir ...

Suppose you make 30 annual investments in a fund that pays

Suppose you make 30 annual investments in a fund that pays 6% compounded annually. If your first deposit is $7,500 and each successive deposit is 6% greater than the preceding deposit, how much will be in the fund immedi ...

Question -under what circumstances is it ethical if ever to

Question :- Under what circumstances is it ethical, if ever, to use consumer information in marketing research? Explain why you consider it ethical or unethical.

What are the differences between four types of economics

What are the differences between four types of economics evaluations and their differences with other two (budget impact analysis (BIA) and cost of illness (COI) studies)?

What type of economic system does norway have explain some

What type of economic system does Norway have? Explain some of the benefits of this system to the country and some of the drawbacks,

Among the who imf and wto which of these governmental

Among the WHO, IMF, and WTO, which of these governmental institutions do you feel has most profoundly shaped healthcare outcomes in low-income countries and why? Please support your reasons with examples and research/doc ...

A real estate developer will build two different types of

A real estate developer will build two different types of apartments in a residential area: one- bedroom apartments and two-bedroom apartments. In addition, the developer will build either a swimming pool or a tennis cou ...

Question what some of the reasons that evolutionary models

Question : What some of the reasons that evolutionary models are considered by many to be the best approach to software development. The response must be typed, single spaced, must be in times new roman font (size 12) an ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As