Ask Applied Statistics Expert

Question 1: Cluster Analysis

The spss file "metropolitan areas.sav" contains a data set taken from "Cities - Life in the World's 100 largest metropolitan areas, Population Crisis Committee, Washington, 1990". The data includes information about the following variables:

Population = population in millions

Murders = no of murders per year per 100,000 people

Food = percentage of income spent on food

Pproom = average number of persons living in one room

Water = % of homes with access to water and electricity

Telephone = no of telephones per 100 people

School = % of children completing education to age 18 years

Infant death = infant deaths/100 live births

Noise = ambient noise level on scale 1 (quietest) to 10 (noisiest)

Traffic = traffic flow: average mph of traffic in rush hour

area = area code: 1 = USA, Canada, Europe, Japan, Australia

In order to reduce the complexity of the data, I have conducted a cluster analysis.

a) What does an agglomeration schedule tell us in general? Provide a brief hypothetical example (using the Metropolitan Areas case), outlining the circumstances in which we might be interested in interpreting the agglomeration schedule.

b) When performing the hierarchical cluster analysis, I decided to select a 4 cluster solution. Would you have chosen the same number of clusters? What are the criteria for making this decision?

c) Please briefly summarize the key findings from the K-Means cluster solution. Do you believe it is a good solution? How would you label the clusters? What could be done to try improving the cluster solution?

d) As you can see in the dialog box for the K-Means cluster analysis, I did not specify any initial cluster means before performing the analysis. Why does it normally make sense to predetermine these values? What kinds of cluster means would make sense here as an input to the K-means cluster model?

e) Imagine that we obtain data from additional cities that are not currently included in our data set. How can I assign these new observations to one of the clusters identified in our previous analysis?

Question 2: Logistic Regression

A study was done to examine the characteristics of MBA graduates from four top US business schools. From the study, a subset of 100 students was selected. The data sample includes information on each student's profile with respect to

1. Grade Point Average (GPA)

2. GMAT Score

3. College Major

a. Humanities/Social Science (binary: 1=yes, 0=no)

b. Maths/Engineering (binary: 1=yes, 0=no)

c. Business (binary: 1=yes, 0=no)

4. Gender (1=Male, 2=Female)

5. Work Experience (1=1 year, 2=2years,...,6=more than 6 years)

One of the business schools (variable name: School_B), which is located on the East Coast has analyzed the data in order to better understand the profile of their MBA students in comparison to students at other top schools. In particular, a logistic regression analysis was performed using a binary variable (attendance=1; non-attendance=0) to predict the probability that a student in the survey attended School_B (instead of one of the other three schools).

The following screenshots display the steps taken when performing the logistic regression analysis in SPSS. The SPSS output report can be found in a separate file called appendix 2.

a) Based on the SPSS output provided in Appendix 2, is this a good model for predicting whether MBA students in the sample attended School_B? Please justify your answer from a statistical point of view by assessing model fit and overall model significance.

b) According to the output report, the significance level of the Hosmer-Lemeshow test is p=0.713. What does this mean? Is this good or bad news?

c) What types of students does School B attract? What are the most important predictors for attendance of School B?

d) In the output report you can see that GPA is a significant predictor of attendance at School B. Moreover, the natural logarithm of the unstandardized slope coefficient for GPA is Exp(B)=22.794. What does this mean?

e) According to the classification plot at the end of the SPSS output report, does the model seem to be better at predicting "attendance" or "non-attendance" at School B? Would you say that 0.5 is a reasonable cut-off value as a classification threshold?

Assignment Files -

https://www.dropbox.com/s/szbkh90yj0f8kk6/Assignment%20Files.zip?dl=0

Applied Statistics, Statistics

  • Category:- Applied Statistics
  • Reference No.:- M92248522

Have any Question?


Related Questions in Applied Statistics

Question onea a factory manager claims that workers at

QUESTION ONE (a) A factory manager claims that workers at plant A are faster than those at plant B. To test the claim, a random sample of times (in minutes) taken to complete a given task was taken from each of the plant ...

You are expected to work in groups and write a research

You are expected to work in groups and write a research report. When you work on your report, you need to use the dataset, and other sources such as journal articles. If you use website material, please pay attention to ...

Assignment -for each of the prompts below report the

Assignment - For each of the prompts below, report the appropriate degrees of freedom, t statistic, p-value and plot using the statistical software platform of your choice (R/STATA) 1) A sample of 12 men and 14 women hav ...

Assignment - research topicpurpose the purpose of this task

Assignment - Research topic Purpose: The purpose of this task is to ensure you are progressing satisfactorily with your research project, and that you have clean, useable data to analyse for your final project report. Ta ...

Assessment task -you become interested in the non-skeletal

Assessment Task - You become interested in the non-skeletal effects of vitamin D and review the literature. On the basis of your reading you find that there is some evidence to suggest that vitamin D deficiency is linked ...

Part a -question 1 - an analyst considers to test the order

PART A - Question 1 - An analyst considers to test the order of integration of some time series data. She decides to use the DF test. She estimates a regression of the form Δy t = μ + ψy t-1 + u t and obtains the estimat ...

Medical and applied physiology experimental report

Medical and Applied Physiology Experimental Report Assignment - Title - Compare the working and spatial memory by EEG. 30 students were tested (2 memory games were played to test their memory - a card game and a number g ...

Business data analysis computer assignment -part 1

Business Data Analysis Computer Assignment - PART 1 - Economists believe that high rates of unemployment are linked to decreased life satisfaction ratings. To investigate this relationship, a researcher plans to survey a ...

Question - go to the website national quality forum nqf

Question - Go to the website, National Quality Forum (NQF), located in the Webliography, and download the article by WIRED FOR QUALITY: The Intersection of Health IT and Healthcare Quality, Number 8, MARCH 2008. You are ...

Go to the webliography source for the national cancer

Go to the Webliography source for the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program. In the Fast Stats, create your own cancer statistical report, "Stratified by Data Type," and u ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As