Statistics for Environmental Engineers

Скачать в pdf «Statistics for Environmental Engineers»

Selecting the “Best” Regression Model

The “best” model is the one that adequately describes the data with the fewest parameters. Table 38.3 summarizes parameter estimates, the coefficient of determination R2, and the regression sum of squares for all eight possible linear models. The total sum of squares, of course, is the same in all eight cases because it depends on the data and not on the form of the model. Standard errors [SE] and t ratios (in parentheses) are given for the complete model, Model A.

One approach is to examine the t ratio for each parameter. Roughly speaking, if a parameter’s t ratio is less than 2.5, the true value of the parameter could be zero and that term could be dropped from the equation.

Another approach is to examine the confidence intervals of the estimated parameters. If this interval includes zero, the variable associated with the parameter can be dropped from the model. For example, in Model A, the coefficient of z2 is b3 = -1.13 with standard error = 1.1 and 95% confidence interval [-3.88 to +1.62]. This confidence interval includes zero, indicating that the true value of b3 is very likely to be zero, and therefore the term z2 can be tentatively dropped from the model. Fitting the simplified model (without z2) gives Model B in Table 38.3.

The standard error [SE] is the number in brackets. The half-width of the 95% confidence interval is a multiple of the standard error of the estimated value. The multiplier is a t statistic that depends on the selected level of confidence and the degrees of freedom. This multiplier is not the same value as the t ratio given in Table 38.3. Roughly speaking, if the degrees of freedom are large (n — p > 20), the halfwidth of the confidence interval is about 2SE for a 95% confidence interval. If the degrees of freedom are small (n — p < 10), the multiplier will be in the range of 2.3SE to 3.0SE.

Скачать в pdf «Statistics for Environmental Engineers»