Statistics for Environmental Engineers

Скачать в pdf «Statistics for Environmental Engineers»

What Does “Explained” Mean?


Caution is recommended in interpreting the phrase “R explains the variation in the dependent variable.” R2 is the proportion of variation in a variable Y that can be accounted for by fitting Y to a particular model instead of viewing the variable in isolation. R does not explain anything in the sense that “Aha! Now we know why the response indicated by y behaves the way we have observed in this set of data.” If the data are from a well-designed controlled experiment, with proper replication and randomization, it is reasonable to infer that an significant association of the variation in y with variation in the level of x is a causal effect of x. If the data had been observational, what Box (1966) calls happenstance data, there is a high risk of a causal interpretation being wrong. With observational data there can be many reasons for associations among variables, only one of which is causality.


A value of R2 is not just a rescaled measure of variation. It is a comparison between two models. One of the models is usually referred to as the model. The other model — the null model — is usually never mentioned. The null model (у = во) provides the reference for comparison. This model describes a horizontal line at the level of the mean of the y values, which is the simplest possible model that could be fitted to any set of data.


•    The model (у = в0 + в1 x + в2x + — + e;) has residual sum of squares X (ytу)2 = RSSmodel.

Скачать в pdf «Statistics for Environmental Engineers»