Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Applied Statistics Expert

Question 1: Cluster Analysis

The spss file "metropolitan areas.sav" contains a data set taken from "Cities - Life in the World's 100 largest metropolitan areas, Population Crisis Committee, Washington, 1990". The data includes information about the following variables:

Population = population in millions

Murders = no of murders per year per 100,000 people

Food = percentage of income spent on food

Pproom = average number of persons living in one room

Water = % of homes with access to water and electricity

Telephone = no of telephones per 100 people

School = % of children completing education to age 18 years

Infant death = infant deaths/100 live births

Noise = ambient noise level on scale 1 (quietest) to 10 (noisiest)

Traffic = traffic flow: average mph of traffic in rush hour

area = area code: 1 = USA, Canada, Europe, Japan, Australia

In order to reduce the complexity of the data, I have conducted a cluster analysis.

a) What does an agglomeration schedule tell us in general? Provide a brief hypothetical example (using the Metropolitan Areas case), outlining the circumstances in which we might be interested in interpreting the agglomeration schedule.

b) When performing the hierarchical cluster analysis, I decided to select a 4 cluster solution. Would you have chosen the same number of clusters? What are the criteria for making this decision?

c) Please briefly summarize the key findings from the K-Means cluster solution. Do you believe it is a good solution? How would you label the clusters? What could be done to try improving the cluster solution?

d) As you can see in the dialog box for the K-Means cluster analysis, I did not specify any initial cluster means before performing the analysis. Why does it normally make sense to predetermine these values? What kinds of cluster means would make sense here as an input to the K-means cluster model?

e) Imagine that we obtain data from additional cities that are not currently included in our data set. How can I assign these new observations to one of the clusters identified in our previous analysis?

Question 2: Logistic Regression

A study was done to examine the characteristics of MBA graduates from four top US business schools. From the study, a subset of 100 students was selected. The data sample includes information on each student's profile with respect to

1. Grade Point Average (GPA)

2. GMAT Score

3. College Major

a. Humanities/Social Science (binary: 1=yes, 0=no)

b. Maths/Engineering (binary: 1=yes, 0=no)

c. Business (binary: 1=yes, 0=no)

4. Gender (1=Male, 2=Female)

5. Work Experience (1=1 year, 2=2years,...,6=more than 6 years)

One of the business schools (variable name: School_B), which is located on the East Coast has analyzed the data in order to better understand the profile of their MBA students in comparison to students at other top schools. In particular, a logistic regression analysis was performed using a binary variable (attendance=1; non-attendance=0) to predict the probability that a student in the survey attended School_B (instead of one of the other three schools).

The following screenshots display the steps taken when performing the logistic regression analysis in SPSS. The SPSS output report can be found in a separate file called appendix 2.

a) Based on the SPSS output provided in Appendix 2, is this a good model for predicting whether MBA students in the sample attended School_B? Please justify your answer from a statistical point of view by assessing model fit and overall model significance.

b) According to the output report, the significance level of the Hosmer-Lemeshow test is p=0.713. What does this mean? Is this good or bad news?

c) What types of students does School B attract? What are the most important predictors for attendance of School B?

d) In the output report you can see that GPA is a significant predictor of attendance at School B. Moreover, the natural logarithm of the unstandardized slope coefficient for GPA is Exp(B)=22.794. What does this mean?

e) According to the classification plot at the end of the SPSS output report, does the model seem to be better at predicting "attendance" or "non-attendance" at School B? Would you say that 0.5 is a reasonable cut-off value as a classification threshold?

Assignment Files -

https://www.dropbox.com/s/szbkh90yj0f8kk6/Assignment%20Files.zip?dl=0

Applied Statistics, Statistics

  • Category:- Applied Statistics
  • Reference No.:- M92248522

Have any Question?


Related Questions in Applied Statistics

Assignment -in this assignment ms excel must be used to

Assignment - In this assignment MS Excel must be used to perform any calculations/graphical presentations as required in this assignment. Question 1 - Below you are given the examination scores of 20 students (data set a ...

Assignment - statistics in educational researchpart a -

Assignment - Statistics in Educational Research Part A - Show all your work Q1) A teacher asked each of her students how many novels they had read in the previous six months. The results are shown below. 0 1 5 4 2 1 3 2 ...

Part a -question 1 - true or false in data collection the

Part A - Question 1 - True or False: In data collection, the most common technique to ensure proper representation of the population is to use a random sample. True False Question 2 - Most analysts focus on the cost of H ...

Business analytics and statistics research report -this

Business Analytics and Statistics Research Report - This assignment is based on fictional data. You are creating a business report for the CEO of a retail company called, Athlete Panda. It must be professional in present ...

Assignment - research topicpurpose the purpose of this task

Assignment - Research topic Purpose: The purpose of this task is to ensure you are progressing satisfactorily with your research project, and that you have clean, useable data to analyse for your final project report. Ta ...

Using minitab statistics to calculate the various

Using Minitab Statistics to calculate the various T-Tests The steps required for completing the deliverables for this assignment (screen shots that correspond to these instructions can be found immediately following them ...

Question - go to the website national quality forum nqf

Question - Go to the website, National Quality Forum (NQF), located in the Webliography, and download the article by WIRED FOR QUALITY: The Intersection of Health IT and Healthcare Quality, Number 8, MARCH 2008. You are ...

Assignment -in this assignment ms excel must be used to

Assignment - In this assignment MS Excel must be used to perform any calculations/graphical presentations as required in this assignment. Question 1 - Below you are given the examination scores of 20 students (data set a ...

Business analytics and statistics research reportthis

Business Analytics and Statistics Research Report This assignment is based on fictional data. You are creating a business report for the CEO of a retail company called, Athlete Panda. It must be professional in presentat ...

Question 1 for the prostate data set fit a model with lpsa

Question 1. For the prostate data set, fit a model with lpsa as the response, and the other variables as predictors. (a) Suppose a new patient with the following values arrives: lcavol = 1.45000, lweight = 3.59801, age = ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As