Statistics for Environmental Engineers

Скачать в pdf «Statistics for Environmental Engineers»


Widely used methods have the potential to be frequently misused. Linear regression, the most widely used statistical method, can be misused or misinterpreted if one relies too much on R2 as a characterization of how well a model fits.

R is a measure of the proportion of variation in y that is accounted for by fitting y to a particular linear model instead of describing the data by calculating the mean (a horizontal straight line). High R does not prove that a model is correct or useful. A low R2 may indicate a statistically significant relation between two variables although the regression has no practical predictive value. Replication dramatically improves the predictive error of a model, and it makes possible a formal lack-of-fit test, but it reduces the R2 of the model.

Totally spurious correlations, often with high R2 values, can arise when unrelated variables are combined. Two examples of particular interest to environmental engineers are presented by Sherwood (1974) and Rowe (1974). Both emphasize graphical analysis to stimulate and support any regression analysis. Rowe discusses the particular dangers that arise when sets of variables are combined to create new variables such as dimensional numbers (Froude number, etc.). Benson (1965) points out the same kinds of dangers in the context of hydraulics and hydrology.


Anderson-Sprecher, R. (1994). “Model Comparison and R2,” Am. Stat., 48(2), 113-116.

Anscombe, F. J. (1973). “Graphs in Statistical Analysis,” Am. Stat., 27, 17-21.

Benson, M. A. (1965). “Spurious Correlation in Hydraulics and Hydrology,” J. Hydraulics Div., ASCE, 91, HY4, 35-45.

Скачать в pdf «Statistics for Environmental Engineers»