Statistics for Environmental Engineers

Скачать в pdf «Statistics for Environmental Engineers»

The maximum possible value for R when there are repeat measurements is:

„2 = Total SS (corrected) — Pure error SS max    Total SS (corrected)

The pure error SS does not change when terms are added or removed from the model in an effort to improve the fit. For our example:

max R2

697.5 — 112.3



The actual R = 581.12/697.5 = 0.83. Therefore, the regression has explained 100(0.833/0.839) = 99% of the amount of variation that can be explained by the model.

A Note on Lack-Of-Fit

If repeat measurements are available, a lack-of-fit (LOF) test can be done. The lack-of-fit mean square s2L = SSLOF/dfLOF is compared with the pure error mean square s] = SSPE/dfPE. If the model gives an adequate fit, these two sums of squares should be of the same magnitude. This is checked by comparing the ratio sL/s] against the F statistic with the appropriate degrees of freedom. Using the values in Table 39.4 gives sL/s] = 1.35/11.23 = 0.12. The F statistic for a 95% confidence test with three degrees of freedom to measure lack of fit and ten degrees of freedom to measure the pure error is F310 = 3.71. Because s2L/s2e = 0.12 is less than F3,10 = 3.71, there is no evidence of lack-of-fit. For this lack-of-fit test to be valid, true repeats are needed.

© 2002 By CRC Press LLC

A Note on Description vs. Prediction

Is the regression useful? We have seen that a high R does not guarantee that a regression has meaning. Likewise, a low R may indicate a statistically significant relationship between two variables although the regression is not explaining much of the variation. Even less does statistically significant mean that the regression will predict future observations with much accuracy. “In order for the fitted equation to be regarded as a satisfactory predictor, the observed F ratio (regression mean square/residual mean square) should exceed not merely the selected percentage point of the F distribution, but several times the selected percentage point. How many times depends essentially on how great a ratio (prediction range/error of prediction) is specified” (Box and Wetz, 1973). Draper and Smith (1998) offer this rule-of-thumb: unless the observed F for overall regression exceeds the chosen test percentage point by at least a factor of four, and preferably more, the regression is unlikely to be of practical value for prediction purposes. The regression in Figure 39.4 has an F ratio of 581.12/8.952 = 64.91 and would have some practical predictive value.

Скачать в pdf «Statistics for Environmental Engineers»