Question 1 (6 points):  You have obtained measurements of height in inches of 29 female and 81 male students (Studenth) at your university. A regression of the height on a constant and a binary variable (BFemme), which takes a value of one for females and is zero otherwise, yields the following result:


 Studenth_hat= 71.0 – 4.84×BFemme , R2 = 0.40, SER = 2.0

                          (0.3)    (0.57)

(a) What is the interpretation of the intercept? What is the interpretation of the slope? How tall are females, on average?

(b) Test the hypothesis that females, on average, are shorter than males, at the 1% level. Make sure you illustrate your test graphically. Show all the steps.

(c) Is it likely that the error term is homoskedastic here? Explain.

Question 2 (4 points):  You have obtained a sub-sample of 1744 individuals from the Current Population Survey (CPS) and are interested in the relationship between weekly earnings and age. The regression, using heteroskedasticity-robust standard errors, yielded the following result:


 Earn _hat= 239.16 + 5.20×Age , R2 = 0.05, SER = 287.21.,

                      (20.24) (0.57)

where Earn and Age are measured in dollars and years respectively.

(a) Is the relationship between Age and Earn statistically significant? Justify the answer.

(b) The variance of the error term and the variance of the dependent variable are related. Given the distribution of earnings, do you think it is plausible that the distribution of errors is normal? Explain.


Question 3 (4 points) Using the California School data set from your textbook, you run the following regression:

 TestScore_hat= 698.9 - 2.28 STR   n = 420, R2 = 0.051, SER = 18.6

where TestScore is the average test score in the district and STR is the student-teacher ratio. Using heteroskedasticity robust standard errors, you find a standard error of 0.52, while choosing the homoskedasticity-only option, the standard error is 0.48.

a) Calculate the t-statistic for both standard errors when testing the significance of the slope. Show all the steps.

b) Which of the two t-statistics should you base your inference on?

Question 4 (24 points): This is a computer-based questions that requires the use of Stata.

The data file HW2 contains data from a random sample of high school seniors interviewed in 1980 and re-interviewed in 1986. Your goal is to investigate the relationship between the number of completed years of education for young adults and the distance from each student’s high school to the nearest four-year college. (Proximity to college lowers the cost of education, so that students who live closer to a four-year college should, on average, complete more years of higher education.) A detailed description is given under FILES> Homework > HW2.

a) Run a regression of years of completed education (ED) on distance to the nearest

college (Dist), where Dist is measured in tens of miles. (For example, Dist =2 means that the distance is 20 miles.) What is the estimated intercept? What is the estimated slope? Interpret in words these estimates.

b) How does the average value of years of completed schooling change when colleges are built close to where students go to high school?

c) John’s high school was 20 miles from the nearest college. Predict John’s years of completed education. How would the prediction change if John lived 10 miles from the nearest college? Show all the steps.

d) Does distance to college explain a large fraction of the variance in educational attainment across individuals? Explain.

e) Is the estimated regression slope coefficient statistically significant? Conduct your test at 10%, 5%, and 1% significance level? What is the p-value associated with coefficient’s t-statistic?

f) Construct a 95% confidence interval for the slope coefficient.

g) Run the regression using data only on females and repeat (b).

h) Run the regression using data only on males and repeat (b).

i) Is the effect of distance on completed years of education different for men than for women?


Based from the regression equation provided above, it could be seen that the following linear equation could be gleaned.
ED-Hat = 13.96 – 0.073Dist
Thus, the following are the results = (1) the estimated intercept is equals 13.96 years; (b) the estimated slope of the linear regression is equal to -0.073 years for every 10 miles.
Accordingly, this suggests that when the distance (Dist) between the student’s high school and the nearest college increases by about 10 miles, the estimated completed years of education (ED) is ...
