Statistics for Environmental Engineers

Скачать в pdf «Statistics for Environmental Engineers»

Lurking variables. Sometimes important variables are not measured, for a variety of reasons. Such variables are called lurking variables. The problems they can cause are discussed by Box (1966) and Joiner (1981). A related problem occurs when a truly influential variable is carefully kept within a narrow range with the result that the variable appears to be insignificant if it is used in a regression model.

Nonconstant variance. The error associated with measurements is often nearly proportional to the magnitude of their measured values rather than approximately constant over the range of the measured values. Many measurement procedures and instruments introduce this property.

Nonnormal distributions. We are strongly conditioned to think of data being symmetrically distributed about their average value in the bell shape of the normal distribution. Environmental data seldom have this distribution. A common asymmetric distribution has a long tail toward high values.

Serial correlation. Many environmental data occur as a sequence of measurements taken over time or space. The order of the data is critical. In such data, it is common that the adjacent values are not statistically independent of each other because the natural continuity over time (or space) tends to make neighboring values more alike than randomly selected values. This property, called serial correlation, violates the assumptions on which many statistical procedures are based. Even low levels of serial correlation can distort estimation and hypothesis testing procedures.

Complex cause-and-effect relationships. The systems of interest—the real systems in the field — are affected by dozens of variables, including many that cannot be controlled, some that cannot be measured accurately, and probably some that are unidentified. Even if the known variables were all controlled, as we try to do in the laboratory, the physics, chemistry, and biochemistry of the system are complicated and difficult to decipher. Even a system that is driven almost entirely by inorganic chemical reactions can be difficult to model (for example, because of chemical complexation and amorphous solids formation). The situation has been described by Box and Luceno (1997): “All models are wrong but some are useful.” Our ambition is usually short of trying to discover all causes and effects. We are happy if we can find a useful model.

Скачать в pdf «Statistics for Environmental Engineers»