Statistics for Environmental Engineers

Conversely, some pollutants may not exhibit their effect for years. Carcinogens are an example where the long-term average could be important. Long-term in this context is years, so the 30-day average would not be a particularly useful statistic. The first ingested (or inhaled) irritants may have more importance than recently ingested material. If so, perhaps past events should be weighted more heavily than recent events if a statistic is to relate source of pollution to present effect. Choosing a statistic with the appropriate weighting could increase the value of the data to biologists, epidemiologists, and others who seek to relate pollutant discharges to effects on organisms.

Plotting on a Logarithmic Scale

The top panel of Figure 4.1 is a plot of influent copper concentration at a wastewater treatment plant. This plot emphasizes the few high values, expecially those at days 225, 250, and 340. The bottom panel shows the same data on a logarithmic scale. Now the process behavior appears more consistent. The low values are more evident, and the high values do not seem so extreme. The episode around day 250 still looks unusual, but the day 225 and 340 values are above the average (on the log scale) by about the same amount that the lowest values are below average.

Are the high values so extraordinary as to deserve special attention? Or are they rogue values (outliers) that can be disregarded? This question cannot be answered without knowing the underlying distribution of the data. If the underlying process naturally generates data with a lognormal distribution, the high values fit the general pattern of the data record.

