Q1) Bicycle helmet use. Table below lists data from cross-sectional survey of bicycle safety. Explanatory variable is measure of neighbourhood socioeconomic status (variable_RFM). Response variable is "percent of bicycle riders wearing a helmet" (P_Helm).
Table Data for exercise. Percent of school children receiving free or reduced-free lunches at school (variable P_RFM) and percent of bicycle riders wearing a helmet (variable P_HELM). Data for this study was recorded by field observers in October of 1994.
|
I(i)
|
School
|
P_RFM
|
P-HELM
|
|
1
|
Fair Oaks
|
50
|
2.1
|
|
2
|
Strandwood
|
11
|
35.9
|
|
3
|
Walnut Acres
|
2
|
57.9
|
|
4
|
Disc Bay
|
19
|
22.2
|
|
5
|
Belshaw
|
26
|
42.4
|
|
6
|
Kennedy
|
73
|
5.8
|
|
7
|
Cassell
|
81
|
3.6
|
|
8
|
Miner
|
51
|
21.4
|
|
9
|
Sedgewick
|
11
|
55.2
|
|
10
|
Sakamoto
|
2
|
33.3
|
|
11
|
Toyon
|
19
|
32.4
|
|
12
|
Lietz
|
25
|
38.4
|
|
13
|
Los Arboles
|
84
|
46.6
|
i) Create a scatterlpot of P_RFM and P_HELM. If drawing plot by hand, use graph paper to make sure accuracy. Be sure you label axes. After you have created the scatterplot, consider its form and direction. Recognize outliers, if any.
ii) Compute r for all 13 data points. Explain correlation strength.
iii) A good case can be made that observation 13 (Los Arboles) is the outlier. Describe what this means in plain terms.
iii) In practice, next step in analysis would be to recognize the cause of outlier. Assume we find out that Los Arboles hada special program in place to support helmet use. In this sense, it is from the different population, so we make a decision to exclude it from further analyses. Eliminate this outlier and recalculate r. To what extent did removal of outlier improve fit of the Test Ho:p=0 ( excluding outlying observation 13).