Sunday, July 17, 2011

Regression Analysis



How good is my regression model?
This page takes us through a question from an old copy of Charles T Horngren's Cost Accounting textbook. The purpose of the exercise is that we are given a set of data from which we are to derive and comment on a regression model that we can use to predict cost values.
What follows is the table of data, a table that takes us step by step through a comparison of the validity of the variables in the data set; and three graphs that give us an insight into the normality or otherwise of the variables.
By the end of the page, we have a much deeper insight into the model, the variables in the model and the usefulness of the model we might finally decide to use.
The data set

Year
Non academic overhead costs
Number of non academic staff
Number of student applications
Number of enrolled students
1
2200
29
1010
342
2
4120
36
1217
496
3
3310
49
927
256
4
4410
53
1050
467
5
4210
54
1563
387
6
5440
58
1127
492
7
5600
88
1892
513
8
4380
72
1362
387
9
5270
83
1623
346
10
7610
73
1646
487
11
8070
101
1870
564
12
10388
103
1253
764

The results of the regression analysis: including the Durbin Watson statistic: we ar dealing here with time series data; and we use the Durbin Watson technique to assess the possibility of autocorrelation being present in our data.

Regression 1  
Overhead costs = f(number of non academic staff)  
   
variablecoefficientstandard error
constant112.041119.4
independent variable 1: number of staff79.6815.89
r2 =0.72 
standard error of residuals =1269.2 
Durbin Watson =1.82 
   
Regression 2  
Overhead costs = f(number of student applicants)  
   
variablecoefficientstandard error
constant1147.42710
independent variable 2: number of applicants3.11.91
r2 =0.21 
standard error of residuals =2118.7 
Durbin Watson =0.89 
   
Regression 3  
Overhead costs = f(number of enrolled students)  
   
variablecoefficientstandard error
constant-1382.21350.3
independent variable 3: number of students14.832.84
r2 =0.73 
standard error of residuals =1233 
Durbin Watson =1.09 

The table that takes us through the comparison of the variables in the data set

 Regression ARegression BRegression C
CriterionNon academic staffStudent applicationsEnrolled students
1 Economic PlausibilityOne would expect that there is a relationship between the number of non academic staff and non academic overheadsThe relationship between non academic overheads and student applications should be small unless the most significant non academic job is dealing with student applicationsEnrolled students must, by definition, be a significant driver of non academic costs
2 Goodness of fitr2 = 0.72r2 = 0.21r2 = 0.73
3 Significance of independent variablest statistic of non academic staff of 5.02 is significantt statistic of student applications of 1.62 is insignificantt statistic of enrolled students of 5.22 is significant
4 Specification analysis   
A linearityReasonably linear with a relatively high correlation coefficient of 0.72This data series returns a correlation coefficient of only 0.21 suggesting a poor relationship between student applications and non academic costsStudents enrolled values are reasonably linear
B constant variance of residualsWe only have 12 data points for this exercise so our results are not that reliable. However, the constant variance assumption does not appear valid here … plot non academic costs against the residuals of non academic staff and see for yourself!The same as with non academic staff!There is some hope here: the residuals plot doesn’t seem to contradict the constant variance assumption with enrolled students.
C independence of residualsDW = 1.92 In general for such a sample size, this means that there is no problem of serial correlation (aka autocorrelation)DW = 0.89 is outside of the range 1.3 – 2.7 thus there is the possibility that we are dealing with data that suffers from autocorrelationDW = 1.09 is outside of the range 1.3 – 2.7 thus there is the possibility that we are dealing with data that suffers from autocorrelation
D normality of residualsUsing SKEW from Excel this distribution of residuals is slightly skewed This distribution of residuals is heavily skewed This distribution of residuals is not skewed 

Graphs to accompany the normality analysis section of the table. The graphs show the skewness coefficient derived from the =SKEW() function in Excel together with the plot of the residuals for each variable.



Conclusions
This page has been prepared to help anyone who is carrying out a regression analysis of a time series data set.
The comparison table has been based on the table found in Horngren and Foster on page 780: the contents of the table found here are my own,
Work through your own example and apply the analysis you see here. Let me know if it helps ... or not!
References
Charles T Horngren & George Foster(1987)
Cost Accounting: a managerial emphasis
6th edition
Prentice Hall

No comments:

Post a Comment