Statistics for Environmental Engineers

Скачать в pdf «Statistics for Environmental Engineers»

Regression begins with the specification of a model to be fitted. One goal is to find a parsimonious model — an adequate model with the fewest possible terms. Sometimes the proposed model turns out to be too simple and we need to augment it with additional terms. The much more common case, however, is to start with more terms than are needed or justified. This is called overfitting. Overfitting is harmful because the prediction error of the model is proportional to the number of parameters in the model.

A fitted model is always checked for inadequacies. The statistical output of regression programs is somewhat helpful in doing this, but a more satisfying and useful approach is to make diagnostic plots of the residuals. As a minimum, the residuals should be plotted against the predicted values of the fitted model. Plots of residuals against the independent variables are also useful. This chapter illustrates how this diagnosis is used to decide whether terms should be added or dropped to improve a model. If a tentative model is modified, it is refitted and rechecked. The model builder thus works iteratively toward the simplest adequate model.

A Model of Sedimentation

Sedimentation removes solid particles from a liquid by allowing them to settle under quiescent conditions. An ideal sedimentation process can be created in the laboratory in the form of a batch column. The column is filled with the suspension (turbid river water, industrial wastewater, or sewage) and samples are taken over time from sampling ports located at several depths along the column. The measure of sedimentation efficiency will be solids concentrations (or fraction of solids removed), which will be measured as a function of time and depth.

Скачать в pdf «Statistics for Environmental Engineers»