A 10 year study by the American Heart Association provided data on how age, blood pressure and smoking relate to the risk of strokes. Data from a portion of this study are shown below. Risk is interpreted as the probability (times 100) that a person will have a stroke over the next 10 year period. For the smoker variable, 1 indicates that the person is a smoker and 0 indicates a non-smoker.
A) If you could choose only 1 of the variables to help you predict the Risk, which variable would you choose? Why? How good would that model be, in terms of predicting the Risk?
B) Are there any of the independent variables that should not be in the model at the same time when you are trying to predict the Risk? If so, which variables? Why?
C) Develop the "best" regression model possible using this set of variables to help you predict the Risk. State your final model. Interpret the coefficients for the model (i.e. what do the numbers mean?). Finally, tell me how to use this model for predicting a person's risk of a stroke. Illustrate this with numbers and interpret its meaning.