First, read Fuguitt and Wilcox, pages 181-183. Then, from the Stock and Watson text Web site find a data file CAschooldistricts that contains data on school districts for 420 districts in California. A detailed description is given in CAschool_Description, also available on the Web site. One of the variables is the student-teacher ration, str. In this exercise you will investigate how the student teacher ratio is related to the district's average test scores.
a. Imagine you constructed a scatterplot of test scores (testscr) on the student-teacher ratio (str). Explain in one sentence how the method of ordinary least squares (OLS) would fit a line to these points.
b. Run a regression of test scores (testscr) on the district's student-teacher ratio (str). What is the estimated intercept? What is the sign and magnitude of the estimated slope coefficient?
c. What is the standard error of the estimate? Provide an intuitive explanation for what the standard error of the estimate tells us.
d. Calculate the residuals for this regression. Then, construct a scatterplot of these residuals on the student-teacher ratio (str). Note: this type of scatterplot is also referred to as a residual plot. Does there appear to be a relationship between the residuals and the str variable, and if so does it suggest there is heteroskedasticity?
e. Is the estimated regression slope coefficient statistically significant? That is, can you reject the null hypothesis H0: Beta1=0 versus a two-sided alternative at the 5% significance level? What is the 95% confidence interval associated with the estimated coefficient?
f. Does str explain a large fraction of the variance in test scores across school districts? How can one answer this question if they know the coefficient of multiple determination?