The file P14_58.xlsx contains monthly cost accounting data o…

The file P14_58.xlsx contains monthly cost accounting data on overhead costs, machine hours, and direct material costs. This problem will help you explore the meaning of R2 and the relationship between R2 and correlations.

a. Create a table of correlations between the individual variables.

b. If you ignore the two explanatory variables Machine Hours and Direct Material Cost and predict each Overhead Cost as the mean of Overhead Cost, then a typical “error” is Overhead Cost minus the mean of Overhead Cost. Find the sum of squared errors using this form of prediction, where the sum is over all observations.

c. Now run three regressions: (1) Overhead Cost (OHCost) versus Machine Hours, (2) OHCost versus Direct Material Cost, and (3) OHCost versus both Machine Hours and Direct Material Cost. (The first two are simple regressions, the third is a multiple regression.) For each, find the sum of squared residuals, and divide this by the sum of squared errors from part b. What is the relationship between this ratio and the associated R2 for that equation? (Now do you see why R2 is referred to as the percentage of variation explained?)

d. For the first two regressions in part c, what is the relationship between R2 and the corresponding correlation between the dependent and explanatory variable? For the third regression it turns out that the R2 can be expressed as a complicated function of all three correlations in part a. That is, the function involves not just the correlations between the dependent variable and each explanatory variable, but also the correlation between the explanatory variables. Note that this R2 is not just the sum of the R2 values from the first two regressions in part c. Why do you think this is true, intuitively? However, R2 for the multiple regression is still the square of a correlation—namely, the correlation between the observed and predicted values of OHCost. Verify that this is the case for these data.