Instruction:
You have to use STATA to do the following problems. What you need to do is using the econometric model to deal with empirical problems. You will find the power of the OLS method, but also how complicated the real world could be. Some problems don’t have the exact answers, thus any creative answer encouraged, but you have to justify your answers briefly.
You need to do problems in groups. Each group contains 2-4 students. (If you really want to do everything by yourself, please let me know beforehand). Hand in one answer for each group. If you don’t agree with everything in the end, you can prepare down what you disagree separately. You are allowed to be in different groups for problem A and problem B. Your answers don’t have to be too long. If you hand in the answers more than ten pages, you will lose some points. Email both the teaching assistant and the instructor answers before the due time and we will reply an email to confirm. You only need to keep the STATA command, key results and brief explanations. The total point is 110, with 10 bonus points.
You can ask us any problem about STATA by email before Aug 20 8pm. We will group-email our response to make it a fair game for everyone. Basically, we have given all the commands you need to you in STATA tutorial files. But if you need to do something fancy, you may need some new commands. Also, the dataset has already included all variables necessary. However, if you want to include some new variables in problem B, it is encouraged. Let us know if you have any problem about how to input other format data to STATA.
problem A:
Use the dataset GPA1.DTA to do the following problems. We are interested in what factors will affect student’s college GPA. Use the command “describe” in STATA first to make sure you understand the meanings of all variables. You can always use 5% as the significance level and 95% as the confidence level.
1) Use log(colGPA) as the dependent variable, log(hsGPA), ACT, male, PC and gradMI as our independent variables.
A) prepare down the model.
B) Run the OLS regression, report your results. describe clearly what the slope coefficients mean separately.
C) Which variables are significant? Based on statistical analysis, which variables you suggest drop from the regression?
D) Report the confidence interval for all variables.
E) Do F-test for joint hypothesis that the gender and achievement scores do not have any effect on the college GPA, what does it tell you?
F) Based on problems C) and E), what’s your suggestion of improving the model? prepare down the new model, run the regression, and report the results. Do you think this new model is better than the original model? Why or why not?
2) Use colGPA as the dependent variable, hsGPA, gradMI, PC and the interaction term between gradMI and PC as our independent variables.
A) prepare down the model.
B) Run the OLS regression, report your results.
C) What is the estimated difference in colGPA between students who is “graduating from Michigan high school and having personal computer”, and students who is “NOT graduating from Michigan high school but having personal computer”? Test the null hypothesis that there is no difference against the alternative that there is a difference. (6 points)
D) What is the estimated difference in colGPA between students who is “graduating from Michigan high school but NOT having personal computer”, and students who is “NOT graduating from Michigan high school and NOT having personal computer”? Test the null hypothesis that there is no difference against the alternative that there is a difference. (6 points)
problem B:
The data is the U.S. Gasoline Market, 36 Yearly Observations, 1960-1995. We want to look for which variables will affect the consumptions of gas per person. You can always use 5% as the significance level and 95% as the confidence level. The data description is below.
Data description:
The U.S. Gasoline Market, 36 Yearly Observations, 1960-1995
Source: Economic Report of the President: 1996, Council of Economic Advisors, 1996.
G_pop = U.S. gasoline consumption per person, computed as total expenditure per person divided by price index.
Pg = Price index for gasoline,
Y = Per capita disposable income,
Pnc = Price index for new cars,
Puc = Price index for used cars,
Ppt = Price index for public transportation,
Pd = Aggregate price index for consumer durables,
Pn = Aggregate price index for consumer nondurables,
Ps = Aggregate price index for consumer services,
Pop = U.S. total population in millions.
1) Run the regression on only one independent variable. From microeconomic theory, we know the price is the key factor (Pg).
A) prepare down the model.
B) Run it. What result do you get? Do you think it makes sense?
C) What does the R-square say?
2) Run a multiple regression of consumptions per person on all other variables, but without the time trend and the population.
A) Report the result.
B) Do t-test for each coefficient. How do you describe the result?
C) Compare the coefficient on the price of gas with the first model. They are totally different, or almost the same? How could you describe this? Give one possible reason.
3) Use Log(G/Pop) as the dependent variable (generate logGPop=log(G_pop)), Log(Pg) and other variables except Pg you used in problem 2 as independent variables to run the regression. Report the result and what does the coefficient on Log(Pg) mean?