Aguirre Lab Home Page: ancova tutorial (2024)

TUTORIAL: ANCOVA in R.-

The following is a tutorial on how to conduct an ANCOVA in R. Like ANOVA, ANCOVA tests whether a variable differs between two or more groups. However, unlike ANOVA, ANCOVA allows one to conduct this test while accounting for a covariate (a variable that may covary with the variable of interest). This is useful in many contexts, including when testing differenes in a morphological characters between groups when individuals differ in overall body size. In this scenario, the covariate would be body size and the ANCOVA would allow one to test whether a morphological character (the dependent variable) varies among groups when accounting for potential differences in body size between groups. If one group is larger than the other, every morphological character is expected to be larger in the larger group, so a test that does not take variation in body size into account would likely produce statistically significant results that are biologically trivial.

Before beginning, you should have R, RStudio, and ggplot2, downloaded and ready for use. See my Beginning Work in R Tutorial.

Note that the code is included in the gray boxes below to make it easy to cut and paste. The explanations are interspersed in regular html.

To begin, you will need to set the working directory, open the file, and attach the variables. To set the working directory, use the setwd function. Select the location on your computer where you created a folder for the data and output files. R will look for your data file and save output files in this folder. I created a folder specifically for this ANOVA tutorial in my R_work folder.


setwd("C:/1awinz/R_work/ancova_tutorial")

R should list the correct working directory as output once you hit enter.

Then we will open our data file. Data files can be generated in Excel by saving spread sheets as tab delimited text files. My example data file is called "data_RS_SL_HL.txt". It includes head length (HL) and standard length (SL) data for male (M) and female (F) anadromous threespine sticleback from Rabbit Slough in the Cook Inlet region of Alaska. The data are taken from a study published by Aguirre and Akinpelu (2010) on sexual dimorphism of head length in threespine stickleback (click here for the pdf). The data file is located on Github if you want to download it and use it in this tutorial (click here to download the data file). This data file has headers (variable names).

If you are using your own data file, make sure to indicate whether it has headers and attach the variable names to the data file and list the data.


read.table("data_RS_SL_HL.txt", header=T)
data=read.table("data_RS_SL_HL.txt", header=T)
attach(data)
names(data)

The first command "read.table("data_RS_SL_HL.txt", header=T)", should result in a listing of your data. The second command, "data=read.table("data_RS_SL_HL.txt", header=T)", assigns the name "data" to the data table and indicates that it has a header (T is for true indicating that the first row lists the variable names). The third command "attach(data)" attaches the variable names to the data file, and the final command "names(data)" lists the variable names. You should see "Sex" "Spec" "SL" "HL" as output for the variable names if you are using my example data file.

Now that we have our data ready, lets talk about the analysis of covariance (ANCOVA). ANCOVA is used to test whether a variable differs between two or more groups while taking into account a covariate, another variable that may covary with the variable of interest. It can be thought of as a combination of ANOVA and regression because one tests for differences in a continuous variable between two or a few groups (like ANOVA), while controlling for another continuous variable, the covariate. So one can generate regression plots in which there are multiple groups.

For example, does head length differ between male and female stickleback? Because fish grow throughout their lives and stickleback females are significantly larger than males, one cannot simply use a t-test to answer this question because most morphological traits are correlated with body size in fishes. If females are larger than males, one would expect them to have larger heads because they are generally larger in body size. One could have also sampled more large males by chance resulting in a sample in which males are larger than females. In either case, one would expect head length to differ between sexes simply because of the difference in body size. This would be a trivial finding if we do not account for differences in body length in the analysis. What we would really like to know is whether head length differs between males and females of the SAME body length, i.e., controlling for body length differences between groups.

Lets work through an example. We will take data from a study by Aguirre and Akinpelu (2010), who examined sexual dimorphism in head length of several populations of Alaskan threespine stickleback. For this example, we will use head length (HL) and standard length (SL) data from male and female anadromous stickleback collected in Rabbit Slough (RS). Does HL differ between sexes when you account for body size differences?

First, let's compute the mean standard length head length for each sex using the aggregate function. Type in the following commands:


aggregate(SL, list(Sex), mean)
aggregate(HL, list(Sex), mean)

This command is telling R to aggregate our specimens for the variables SL and HL by the Sex that they belong to and list the means for each population. You should get output like this:

Aguirre Lab Home Page: ancova tutorial (1)

The table lists the mean SL and HL for each Sex. Females are larger than males in body size as expected but HL is actually larger in males. Now we can use a scatter plot to plot HL against SL and code by sex to visually inspect the relationship between these variables. We will use ggplot2 so open that package in Rstudio first (under the packages tab in the bottom right window). You can learn more about making scatter plots in my tutorial on this: Scatter Plot Tutorial.


ggplot(data, aes(x=SL, y=HL)) + geom_point(aes(color=Sex))+ theme_classic()

Aguirre Lab Home Page: ancova tutorial (2)


The scatter plot shows that males appear to have larger heads than females across the range of body sizes present in this sample. Now we can conduct a formal test using ANCOVA. To do so, we will use the linear model, lm(), function. We will name our model "ancova1" and type the following commands:


ancova1<-lm(HL~Sex*SL)
anova(ancova1)

Which gives the following ANOVA table as output:

Aguirre Lab Home Page: ancova tutorial (3)

The first line of code uses the lm() function to create a linear model depicting how the response variable, HL, varies as a function of our treatment variable, Sex, our covariate SL, and the interaction of Sex:SL. HL~Sex*SL is shorthand that tells R to create a full model with both factors and their interaction (Sex+SL+Sex:SL) as predictors. The ANOVA table summarizes the results of the linear model and shows that the interaction of Sex:SL is not statistically significant, indicating that the individual slopes of the relationship between HL and SL do not differ between males and females, so we can consider the fit lines parallel.

Now we can simplify the model by removing the interaction term (Sex:SL) and testing whether there is a significant loss of predictive power in the model without it:


ancova2<-update(ancova1,~ .- Sex:SL)
anova(ancova1,ancova2)

Which gives the following output:

Aguirre Lab Home Page: ancova tutorial (4)

The table lists the two models that are being compared (full model to model without interaction) and the P value (Pr(>F)) of 0.6343 indicates that removing the interaction does no harm. Now we can update or ancova2 model (the model without the interaction) to see if removing sex makes a difference. We can remove it using the same procedure as above and then compare the ancova2 and our new ancova3 models:


ancova3<-update(ancova2,~ .- Sex)
anova(ancova2,ancova3)

Which gives the following output:

Aguirre Lab Home Page: ancova tutorial (5)

Again, the table lists the two models that are being compared, HL as a function of Sex + SL and HL as a function of just SL. The P value of < 2.2e-16*** indicates that removing the Sex makes a huge difference. Although HL is clearly strongly correlated with SL, HL also differs significantly between male and female stickleback even when variation in SL is accounted for.


SUGGESTED READING:

For an R focused treatment of ANCOVA, see:

-Crawley, M.J. 2015. Statistics, an introduction using R. John Wiley & Sons. West Sussex.

OTHER REFERENCES CITED:

-Aguirre, W.E., and O. Akinpelu. 2010. Sexual dimorphism of head morphology in threespine stickleback. Journal of Fish Biology 77:802-821.

Aguirre Lab Home Page: ancova tutorial (2024)
Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 5979

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.