How good is my regression model?
This page takes us through a question from an old copy of Charles T Horngren's Cost Accounting textbook. The purpose of the exercise is that we are given a set of data from which we are to derive and comment on a regression model that we can use to predict cost values.
What follows is the table of data, a table that takes us step by step through a comparison of the validity of the variables in the data set; and three graphs that give us an insight into the normality or otherwise of the variables.
By the end of the page, we have a much deeper insight into the model, the variables in the model and the usefulness of the model we might finally decide to use.
The data set
The results of the regression analysis: including the Durbin Watson statistic: we ar dealing here with time series data; and we use the Durbin Watson technique to assess the possibility of autocorrelation being present in our data.
The table that takes us through the comparison of the variables in the data set
Graphs to accompany the normality analysis section of the table. The graphs show the skewness coefficient derived from the =SKEW() function in Excel together with the plot of the residuals for each variable.



The comparison table has been based on the table found in Horngren and Foster on page 780: the contents of the table found here are my own,
Work through your own example and apply the analysis you see here. Let me know if it helps ... or not!
References
Charles T Horngren & George Foster(1987)
Cost Accounting: a managerial emphasis
6th edition
Prentice Hall
What follows is the table of data, a table that takes us step by step through a comparison of the validity of the variables in the data set; and three graphs that give us an insight into the normality or otherwise of the variables.
By the end of the page, we have a much deeper insight into the model, the variables in the model and the usefulness of the model we might finally decide to use.
The data set
Year | Non academic overhead costs | Number of non academic staff | Number of student applications | Number of enrolled students |
1 | 2200 | 29 | 1010 | 342 |
2 | 4120 | 36 | 1217 | 496 |
3 | 3310 | 49 | 927 | 256 |
4 | 4410 | 53 | 1050 | 467 |
5 | 4210 | 54 | 1563 | 387 |
6 | 5440 | 58 | 1127 | 492 |
7 | 5600 | 88 | 1892 | 513 |
8 | 4380 | 72 | 1362 | 387 |
9 | 5270 | 83 | 1623 | 346 |
10 | 7610 | 73 | 1646 | 487 |
11 | 8070 | 101 | 1870 | 564 |
12 | 10388 | 103 | 1253 | 764 |
The results of the regression analysis: including the Durbin Watson statistic: we ar dealing here with time series data; and we use the Durbin Watson technique to assess the possibility of autocorrelation being present in our data.
Regression 1 | ||
Overhead costs = f(number of non academic staff) | ||
variable | coefficient | standard error |
constant | 112.04 | 1119.4 |
independent variable 1: number of staff | 79.68 | 15.89 |
r2 = | 0.72 | |
standard error of residuals = | 1269.2 | |
Durbin Watson = | 1.82 | |
Regression 2 | ||
Overhead costs = f(number of student applicants) | ||
variable | coefficient | standard error |
constant | 1147.4 | 2710 |
independent variable 2: number of applicants | 3.1 | 1.91 |
r2 = | 0.21 | |
standard error of residuals = | 2118.7 | |
Durbin Watson = | 0.89 | |
Regression 3 | ||
Overhead costs = f(number of enrolled students) | ||
variable | coefficient | standard error |
constant | -1382.2 | 1350.3 |
independent variable 3: number of students | 14.83 | 2.84 |
r2 = | 0.73 | |
standard error of residuals = | 1233 | |
Durbin Watson = | 1.09 |
The table that takes us through the comparison of the variables in the data set
Regression A | Regression B | Regression C | |
Criterion | Non academic staff | Student applications | Enrolled students |
1 Economic Plausibility | One would expect that there is a relationship between the number of non academic staff and non academic overheads | The relationship between non academic overheads and student applications should be small unless the most significant non academic job is dealing with student applications | Enrolled students must, by definition, be a significant driver of non academic costs |
2 Goodness of fit | r2 = 0.72 | r2 = 0.21 | r2 = 0.73 |
3 Significance of independent variables | t statistic of non academic staff of 5.02 is significant | t statistic of student applications of 1.62 is insignificant | t statistic of enrolled students of 5.22 is significant |
4 Specification analysis | |||
A linearity | Reasonably linear with a relatively high correlation coefficient of 0.72 | This data series returns a correlation coefficient of only 0.21 suggesting a poor relationship between student applications and non academic costs | Students enrolled values are reasonably linear |
B constant variance of residuals | We only have 12 data points for this exercise so our results are not that reliable. However, the constant variance assumption does not appear valid here … plot non academic costs against the residuals of non academic staff and see for yourself! | The same as with non academic staff! | There is some hope here: the residuals plot doesn’t seem to contradict the constant variance assumption with enrolled students. |
C independence of residuals | DW = 1.92 In general for such a sample size, this means that there is no problem of serial correlation (aka autocorrelation) | DW = 0.89 is outside of the range 1.3 – 2.7 thus there is the possibility that we are dealing with data that suffers from autocorrelation | DW = 1.09 is outside of the range 1.3 – 2.7 thus there is the possibility that we are dealing with data that suffers from autocorrelation |
D normality of residuals | Using SKEW from Excel this distribution of residuals is slightly skewed | This distribution of residuals is heavily skewed | This distribution of residuals is not skewed |
Graphs to accompany the normality analysis section of the table. The graphs show the skewness coefficient derived from the =SKEW() function in Excel together with the plot of the residuals for each variable.
Conclusions
This page has been prepared to help anyone who is carrying out a regression analysis of a time series data set.The comparison table has been based on the table found in Horngren and Foster on page 780: the contents of the table found here are my own,
Work through your own example and apply the analysis you see here. Let me know if it helps ... or not!
References
Charles T Horngren & George Foster(1987)
Cost Accounting: a managerial emphasis
6th edition
Prentice Hall
No comments:
Post a Comment