57-1 From ANOVA to GLM to GLIMMIX: Eighty Years of Statistical Modeling in Agronomic Research.

See more from this Division: A11 Biometry
See more from this Session: Symposium--PROC ANOVA, GLM, MIXED, and GLIMMIX/Div. A11 Business Meeting
Monday, November 1, 2010: 8:35 AM
Hyatt Regency Long Beach, Seaview Ballroom C, First Floor
Share |

Walter W. Stroup, Statistics, University of Nebraska, Lincoln, NE
In 1921, R.A. Fisher’s “Studies in Crop Variation” introduced the analysis of variance (ANOVA) to the research world. Over the next several decades, ANOVA became institutionalized as the “gold standard” for formal assessment of experimental data. The “ANOVA mindset” to a large extent drives the statistical design and analysis of research experiments in the plant and soil sciences. While ANOVA was a groundbreaking advance by 1921 standards, it was limited by the fact that pencil-and-paper, or at best crude mechanical calculators, were state-of-the-art in statistical computing. In the 1940s and 1950s, ANOVA was placed in the context of linear model theory. In the 1970’s, advances in computers made linear model software such as SAS PROC GLM possible. While such software increased the speed and flexibility of data analysis somewhat, the linear model theory underlying GLM had important limitations. These limitations are especially noticeable in experiments with split-plot features or with correlation in time or space. In the 1980’s and 1990’s computing technology and statistical modeling mutually enabled and drove one another, dramatically altering the design and analysis landscape. The “ANOVA mindset” (at least as widely understood today) holds that observations must be independent and normally distributed with equal variances among treatment groups and that data failing to meet these assumptions must be transformed – in essence, the ANOVA hammer is the only tool and all data must be made to look like a nail. (as an aside, Fisher would not have understood the contemporary version of the “ANOVA mindset” and would have rejected it). In reality, research data do not follow these constraints. Counts, rates, proportions, time to event – to name just a few types of data – are well-known NOT to be normally distributed, and agronomic data are often NOT independent, but are correlated either in time or space. Contemporary models have for all intents and purposes rendered the “ANOVA mindset” obsolete – generalized linear models do not assume normality; linear mixed models do not assume independence or equal variance; generalized linear mixed models, as the name implies, combine the capabilities of generalized and mixed models. Non-linear mixed models advance what is possible even further. Contemporary models also challenge accepted conventional wisdom about design. Designs well suited to independent, normal data are often inappropriate, often catastrophically so, for non-normal or correlated data. Properly exploiting the potential of contemporary models and design strategies can dramatically improve information-per-dollar ratio, increasingly important in an era of ever-tightening budgets. This talk focuses on the changes in statistical modeling, concentrating on the past thirty-five years from the introduction of PROC GLM though the introduction of generalized, mixed, and nonlinear models. Common agronomic experimental formats are used to illustrate the changes. Both analysis and implications for design are considered. The talk concludes with a look ahead at what design and analysis might look like a few years from now and some suggestions for the agronomic research community to use these changes to its advantage.
See more from this Division: A11 Biometry
See more from this Session: Symposium--PROC ANOVA, GLM, MIXED, and GLIMMIX/Div. A11 Business Meeting