Statistics for Environmental Engineers

Скачать в pdf «Statistics for Environmental Engineers»


Censored data are, in essence, missing values. Missing values in data records are common and they are not always a serious problem. If 50 specimens were collected and five of them, selected at random, were damaged or lost, we could do the analysis as though there were only 45 observations. If a few values are missing at random intervals from a time series, they can be filled in without seriously distorting the pattern of the series. The difficulty with censored data is that missing values are not selected at random. They are all missing at one end of the distribution. We cannot go ahead as if they never existed because this would bias the final results.


The odd feature of censored water quality data is that the censored values were not always missing. Some numerical value was measured, but the analytical chemist determined that the value was below the method limit of detection (MDL) and reported <MDL instead of the number. A better practice is to report all values along with a statement of their precision and let the data analyst decide what weight the very low values should carry in the final interpretation. Some laboratories do this, but there are historical data records that have been censored and there are new censored data being produced. Methods are needed to interpret these.


Unfortunately, there is no generally accepted scheme for replacing the censored observations with some arbitrary values. Replacing censored observations with zero or 0.5 MDL gives estimates of the mean that are biased low and estimates of the variance that are high. Replacing the censored values with the MDL, or omitting the censored observations, gives estimates of the mean that are high and variance that are low. The bias of both the mean and variance would increase as the fraction of observations censored increases, or the MDL increases (Berthouex and Brown, 1994).

Скачать в pdf «Statistics for Environmental Engineers»