Ask Other Engineering Expert

K-Means Clustering

Part 1 -

1. Introduction - Cluster analysis involves grouping things together so that the members of each group are more similar to each other than to members of other groups. There are numerous algorithms or models associated with clustering such as k-means clustering, hierarchical clustering, and density models. 

Cluster analysis is popular in market segmentation. For example, the market for a product or service may be segmented into groups of customers or regions that share common interests or are similar in terms of their preferences and socio-economic attributes. An appropriate marketing strategy may then be devised to serve the needs of identified segments better.

This part involves using k-means clustering as a clustering tool to be applied to a data mining study within your domain of interest using R and RStudio.

2. Steps to Completion - For each study the general procedure is to:

  • Review theoretical background based on available resources in the course content
  • Select a dataset from the module's recommended datasets list
  • Run an analysis, perform evaluation, and capture the results
  • Document your findings and analysis in a data mining analytical report

3. Deliverables - Submit your analysis report by addressing the following critical areas:

Introduction: give some background and context about the domain of application, provide the rationale for the type of analysis, and state the objective clearly.

Analysis: describe the data both qualitatively and quantitatively through exploratory analysis, perform necessary preprocessing activities, give some intuition about the algorithm and core parameters, demonstrate the model building steps along with parameter tuning, and explain all your assumptions.

Result: explain the result and interpret the model output using terms that reflect the application area, perform model evaluation using the appropriate metrics, and leverage visualization.

Conclusion: summarize your main findings, discuss experimental limitations related to the data and/or implementation of the algorithm, and suggest improvement areas as a potentiation future work.

Miscellaneous:

  • Proof read your report for correct structure, grammar, and spelling
  • Follow appropriate APA formatting and provide all references
  • Include your R script and extended model outputs in an Appendix section.

The length of the report should be 7-10 pages excluding the title page, appendix and R script.

Part 2 -

Run an exercise on a vehicle dataset and write a report on your findings and results interpretation in your own words. The report needs to cover the exercise key points below in order.

Download the vehicle.csv file to your hard drive.

1. Introduction - What do you expect the k-means clustering method to accomplish for the vehicle data?

2. Data pre-processing

  • Run the set.seed command. Include the command on the report and explain the reason for running this command.
  • Load the data from vehicle.csv file into R. Create a copy of the vehicle dataset called myvehicle. Include the command in the report.
  • Remove the variable class from a myvehicle. Include the command in the report, and explain why we remove the class variable.
  • Run the scale command to scale the myvehicle. Include the command in the report, and explain why we scale data.
  • Discuss any additional data pre-processing that you run. Include the commands and explain what each command does in the report.

3. Run the kmeans method with k=4 and store the output in the variable kc. Include the command in the report and discuss the input parameters you used. Enter kc at the command prompt and hit enter.  Include the command output in the report and answer the following questions.

  • How many instances are in each cluster?
  • What information does the cluster means section of an output provides and how were the numbers obtained?
  • What is clustering vector?
  • What is sum of squares by cluster, and what does it mean?
  • Run the kc$iter command, and explain what the output shows. Include the command, the output, and explanation in the report.

4. Clustering evaluation

Build the cross-tabulation to compare how the method clustered the vehicles with the actual vehicle class.  Include the command and the output in the report.  Answer the following questions.

  • What is the dominant vehicle class in each cluster?
  • What additional information does the table show?
  • What percentage of vehicles were clustered in agreement with the actual class?

5. Build the cluster plot.  Include the command, the plot, and the plot interpretation in the report.

6. Experiment with 3 different k values, and summarize the findings in the tabular format.

k

Number of instances in each cluster

Between clusters sum of squares

Within clusters sum of squares

Number of iterations

4





Value of your choice





Value of your choice





Value of your choice





Explain the effect of k values on method results.

What is an ideal value of k for the vehicle data?  (This is an open-ended question)

7. Summary

  • What differences between k-means clustering and classification methods did you observe?
  • Which part of this exercise did you find the most challenging and which approach did you take to resolve the challenge?

Attachment:- Assignment Files.rar

Other Engineering, Engineering

  • Category:- Other Engineering
  • Reference No.:- M92535476
  • Price:- $170

Guranteed 48 Hours Delivery, In Price:- $170

Have any Question?


Related Questions in Other Engineering

Register design a cpu register is simply a row of

Register design A CPU register is simply a row of flip-flops (i.e. SR, JK, T, etc) put side by side in an array to make the size of register required. For example, an 8 bit register has 8 flip-flops side by side for stor ...

A detailed review of spatial modulation and simulation

A Detailed Review of Spatial Modulation and Simulation Learning Outcomes a. Learn how to model mobile communication channels d. Discern knowledge development and directions on the recent advances in 4G to the research pr ...

Mine safety amp environmental engineering assignment -part

Mine Safety & Environmental Engineering Assignment - Part 1 - Questions 1. Occupational health and safety is the primary factor that needs to be considered in the mining industry. Discuss this statement. 2. Define the fo ...

Projectflow processing of liquor in a mineral refining

Project Flow Processing of Liquor in a Mineral Refining Plant The aim of this project is to design a flow processing system of liquor (slurry) in a mineral (aluminum) refining plant. Aluminum is manufactured in two phase ...

Learning outcomes evaluate multiuser communication and

Learning Outcomes Evaluate multiuser communication and resource sharing techniques; Apply the techniques of, and report on, digital communication applications using Matlab and hardware devices. Assignment Description The ...

Operations engineering assignment -please select only one

Operations Engineering Assignment - Please select only one of the following case studies for your assignment: CASE A. Tesla Motors Tesla is an innovative manufacturer that designs, assemble and sells fully electric vehic ...

Select a risk problem from the list below and prepare a

Select a risk problem from the list below and prepare a risk management plan in accordance with AS/NZS ISO 31000:2009. Please ensure that: - Establish the context clearly, in accordance with the Standard; - Define your s ...

Engineering materials term paper assignment -conduct a

ENGINEERING MATERIALS TERM PAPER ASSIGNMENT - Conduct a thorough literature search and write a 15-20 page technical review paper on the evolution of the engineering materials used in the manufacturing of any one of the f ...

Task 1using the lab kit design a circuit for the processor

Task 1: Using the lab kit, design a circuit for the processor to control the output of a connected 7-segment LED display device. You will be provided with a standard common anode 7-segment display of the type FND-507 (or ...

Control theory - lab reportsfor experiments 1 to 4 you must

Control Theory - Lab Reports For experiments 1 to 4 you must undertake the following: a) At the start of each section (including the pre-lab activities) there are a number learning outcomes. That is, what students should ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As