Statistics for Environmental Engineers

Скачать в pdf «Statistics for Environmental Engineers»

Figure 50.2 shows the empirical distribution of 100 estimates of the 99th percentiles made using a nonparametric method, each estimate being obtained from 100 values drawn at random from the log-normal distribution. The bottom panel of Figure 50.2 shows the distribution of 100 estimates made with the parametric method.

One hundred estimates gives a rough, but informative, empirical distribution. Simulating one thousand estimates would give a smoother distribution, but it would still show that the parametric estimates are less variable than the nonparametric estimates and they are distributed more symmetrically about the true 99th percentile value of p099 = 13.2. The parametric method is better because it uses the information that the data are from a lognormal distribution, whereas the nonparametric method assumes no prior knowledge of the distribution (Berthouex and Hau, 1991).

Although the true 99th percentile of 13.2 pg/L is well below the 18 pg/L limit, both estimation methods show at least 5% violations due merely to random errors in sampling the distribution, and this is with a large sample size of n = 100. For a smaller sample size, the percentage of trials giving a violation will increase. The nonparametric estimation gives more and larger violations.

Bootstrap Sampling

The bootstrap method is random resampling, with replacement, to create new sets of data (Metcalf, 1997; Draper and Smith, 1998). Suppose that we wish to determine confidence intervals for the parameters in a model by the bootstrap method. Fitting the model to a data set of size n will produce a set of n residuals. Assuming the model is an adequate description of the data, the residuals are random errors. We can imagine that in a repeat experiment the residual of the original eighth observation might happen to become the residual for the third new observation, the original third residual might become the new sixth residual, etc. This suggests how n residuals drawn at random from the original set can be assigned to the original observations to create a set of new data. Obviously this requires that the original data be a random sample so that residuals are independent of each other.

Скачать в pdf «Statistics for Environmental Engineers»