Statistics for Environmental Engineers

What Does “Explained” Mean?

Caution is recommended in interpreting the phrase “R explains the variation in the dependent variable.” R2 is the proportion of variation in a variable Y that can be accounted for by fitting Y to a particular model instead of viewing the variable in isolation. R does not explain anything in the sense that “Aha! Now we know why the response indicated by y behaves the way we have observed in this set of data.” If the data are from a well-designed controlled experiment, with proper replication and randomization, it is reasonable to infer that an significant association of the variation in y with variation in the level of x is a causal effect of x. If the data had been observational, what Box (1966) calls happenstance data, there is a high risk of a causal interpretation being wrong. With observational data there can be many reasons for associations among variables, only one of which is causality.

A value of R2 is not just a rescaled measure of variation. It is a comparison between two models. One of the models is usually referred to as the model. The other model — the null model — is usually never mentioned. The null model (у = во) provides the reference for comparison. This model describes a horizontal line at the level of the mean of the y values, which is the simplest possible model that could be fitted to any set of data.

•    The model (у = в0 + в1 x + в2x + — + e;) has residual sum of squares X (ytу)2 = RSSmodel.

