Analysis for the Guppy Learning over WiFi (GLoW) Experiment

Brief Overview

This is the analysis for an experiment conducted by Beatriz Quineche as part of an Honour’s thesis in the Reader Laboratory at McGill University. The data and R code to produce this analysis are in the beatriz-master-data_one.csv.csv file and the learning-chamber-analysis.Rmd file respectively. A list of variables and their descriptions is given in the metadata section of the README file. These files can all be accessed at the GitHub repository for this project.

The goal of this project was to conduct of a proof of concept experiment to see the feasibility of training guppies in a conditioning chamber which consisted of a tank supplemented with raspberry pi controlled lights and feeders as well as a raspberry pi camera. Guppies were given a simple classical conditioning task. Four times a day for three consecutive days a light would briefly illuminate for five seconds after which the light turned off and food was delivered from a servo-powered feeder that was hung above the tank (see Apparatus Design). This resulted in a total of 12 reinforced trials being conducted in the span of three days. On the fifth day guppies were food deprived to increase food motivation and thus increase the likelihood of eliciting a demonstrable behavioural effect for the probe trial which occurred on the sixth day.

The association expected to be formed was that the side of the tank that produces a light predicts which feeder will have food delivered to it. Guppies were expected to demonstrate evidence of successfully learning the association by behaving in the following manner after the light turned on: 1) increasing the amount of time spent in the side of the tank where the feeder was located, 2) increasing the number of visits (coming within 2 body lengths) to the rewarded feeder, and/or 3) approaching the rewarded feeder quicker than the unrewarded feeder.

Data check

Checking categorical variables

First we want to make sure the data is properly formatted. We will use the describe_all_cat function from the package tidyext to do so. This will summarise all of our categorical variables. Note ID was filtered out for this table.

Table 1: Summary of the frequency of observations for all categorical variables. The levels of each variable are given in the group column and the frequency of observations within each level are given in the frequency column. The relative contribution of a particular level of a variable to the total amount of observations within that variable is givin in the % column.
Variable	Group	Frequency	%
batch	2	12	35
batch	3	12	35
batch	1	10	29
light.status	1before	17	50
light.status	2after	17	50
light.side	right	22	65
light.side	left	12	35
size	large	18	53
size	small	16	47

Of note is that we have slightly, but not significantly (Exact binomial test p = 0.121), more individuals for which the light shone on the right side (65%) versus the left side (35%). All other grouping levels are essentially equal.

Checking individual observations

Next we will check the data for individuals. All individuals should have two measures, one for the ‘before light’ phase and one for the ‘after light’ phase.

Table 2: ID data check table caption
Variable	Group	Frequency	%
id	large_pi1_w1	2	6
id	large_pi1_w2	2	6
id	large_pi1_w3	2	6
id	large_pi2_w1	2	6
id	large_pi2_w2	2	6
id	large_pi2_w3	2	6
id	large_pi3_w1	2	6
id	large_pi3_w2	2	6
id	large_pi3_w3	2	6
id	small_pi1_w1	2	6
id	small_pi1_w2	2	6
id	small_pi1_w3	2	6
id	small_pi2_w1	2	6
id	small_pi2_w2	2	6
id	small_pi2_w3	2	6
id	small_pi3_w2	2	6
id	small_pi3_w3	2	6

Checking total time values

Since we used automated tracking we want to make sure that there are minimal tracking errors. We measured the time a guppy was on either the left side or right side of the tank so we can use this to check for any missing data. Since trials lasted 3 minutes these measures should add up to around 180.

Table 3: Checking tracking data errors
id	light.status	total.time
large_pi2_w1	2after	175.6
small_pi2_w1	1before	175.6
small_pi3_w2	2after	177.8
small_pi2_w2	1before	178.0
small_pi1_w1	2after	178.2
small_pi1_w3	1before	178.2
large_pi3_w1	2after	178.6
large_pi3_w2	2after	178.8
large_pi2_w2	1before	179.2
small_pi1_w2	1before	179.2
small_pi2_w2	2after	179.2
large_pi3_w3	1before	179.4
small_pi3_w3	1before	179.6
large_pi1_w2	2after	179.6
small_pi1_w1	1before	179.6
large_pi1_w2	1before	179.8
large_pi3_w3	2after	179.8
small_pi1_w3	2after	179.8
large_pi1_w3	1before	180.0
large_pi3_w2	1before	180.0
large_pi1_w1	1before	180.2
large_pi1_w1	2after	180.2
large_pi1_w3	2after	180.2
large_pi2_w1	1before	180.2
large_pi2_w2	2after	180.2
large_pi2_w3	1before	180.2
large_pi3_w1	1before	180.2
small_pi1_w2	2after	180.2
small_pi2_w1	2after	180.2
small_pi2_w3	2after	180.2
small_pi3_w2	1before	180.2
small_pi3_w3	2after	180.2
large_pi2_w3	2after	180.2
small_pi2_w3	1before	180.2

Models

We analysed the data using linear mixed effect and generalized linear mixed effect models with the lmer() and glmer() functions from lme4 package. P-values and effective degrees of freedom were obtained using the lmerTest package. Model residuals were checked they met distributional assumptions with the DHARMa package, you can click the ‘See Model Residuals’ button below the model formulas to see the residual diagnostic plots produced by DHARMa for that particular model.

Model 1 - Time on the rewarded side

To determine whether individuals increase their time spent on the side of the tank the light shone from, we fit a linear mixed effects model with fixed effect of light status (before or after) as well as a random effect of individual id. Our response variable, ‘rewarding side preference’, is the amount of time a guppy spent on the side which the light shone from subtracted by the time spent on the other side of the tank. This model asks whether the preference for the rewarding side of the tank changed between baseline and test and whether this differs with rewarded object colour.

time_model <-
  lmer(rewarding.side.preference ~ light.status + (1 | id),
    data = full_data
  )

Results

Table 4: Summary of a linear mixed effect model (Model 1) estimating the time spent on the side of the tank where light had been activated (model estimates ± S.E.) where the fixed effect is the factor light status (‘before the light’ or ‘after the light’) and the random effect of individual id. The effect of light status is non-significant, but the estimate indicates guppies approached the rewarding feeder quicker than the unrewarding feeder after the light came on.
Factor	Estimate	Std. Error	T statistic	df	P value
Intercept	-5.918	21.084	-0.281	29.473	0.781
Light status	27.671	25.075	1.104	16.000	0.286

There is a non-significant effect of light status (p = 0.286). Guppies non-significantly increase their preference for the side with the light after the light has shone by 27.7 seconds.

Data are means ± SE. Lines connect individuals across light periods. The dashed line represents the value of the pre-light baseline behavioural measure.

Figure 1: Data are means ± SE. Lines connect individuals across light periods. The dashed line represents the value of the pre-light baseline behavioural measure.

Model 2 - Latency to the rewarded side

To determine whether individuals increase the speed at which they come within two body lengths of the rewarded feeder relative to the unrewarded feeder after the light turns on we fit a linear mixed effects model. Our response variable latency.difference is the latency to approach the unrewarded feeder subtracted by the latency to approach the rewarded feeder. Positive values indicate that the rewarded feeder was approached quicker than the unrewarded feeder. A value of 0 would indicate both feeders were reached at the same time (in this case this means both feeders were never approached and thus both values are given a max score of 180 as it is impossible for guppies to be in two places at once). The fixed effect is the light status which is either ‘before the light turns on’ or ‘after the light turns on’. We additionally fit a random effect of individual id to account for repeated measures.

latency_model <-
  lmer(latency.difference ~ light.status + (1 | id),
    data = full_data
  )

## boundary (singular) fit: see ?isSingular

Results

Table 5: Summary of a linear mixed effect model (Model 2) estimating how fast guppies approached the unrewarded feeder for which the light had been activated over the unrewarded feeder with no lights activated (model estimates ± S.E.) where the fixed effect is the factor light status (‘before the light’ or ‘after the light’). The effect of light status is non-significant, but the estimate indicates guppies approached the rewarding feeder quicker than the unrewarding feeder after the light came on.
Factor	Estimate	Std. Error	T statistic	df	P value
Intercept	-9.941	21.928	-0.453	32	0.653
Light status	39.612	31.011	1.277	32	0.211

After the light came on guppies non-significantly approached the rewarded feeder on average 39.6 seconds faster than the non-rewarded feeder.

Figure 2: Data are means ± SE Bold line connects means across trials.

Model 3 - Visits to the rewarded feeder

To determine whether individual increase the frequency with which they come within two body lengths of the rewarded feder, we fit a binomial generalized linear mixed effects model. Our response variable, denoted by cbind(rewarding.feeder.visits,unrewarding.feeder.visits) is the proportion of visits to the rewarded feeder. This model asks whether the proportion of visits to the rewarded feeder differs between light phases. We fit a random effect of individual id to account for repeated measures.

visits_model <-
  glmer(cbind(rewarding.feeder.visits, unrewarding.feeder.visits) ~ light.status + (1 | id),
    data = full_data,
    family = "binomial"
  )

Results

Table 6: Summary of a binomial generalized linear mixed effects model (Model 3) estimating the proportion of visits guppies made to the activated light side’s feeder (model estimates ± S.E.) where the fixed effect is the factor light status (‘before the light’ or ‘after the light’). The effect of light status is non-significant, but the estimate indicates guppies increase their visits to the unrewarding feeder after its corresponding light had been activated.
Factor	Estimate	Std. Error	T statistic	P value
Intercept	-0.108	0.238	-0.455	0.649
Light status	0.399	0.254	1.568	0.117

There is a non-significant increase in the proportion of visits to the rewarded feeder after the light turns on. Guppies go from making 47% of their visits to the rewarded feeder before the light shines to making 57% of their visits being to the rewarded feeder after the light shines, an increase of 10%.

Figure 3: Data are means ± SE Bold line connects means across trials.

Main findings

Guppies non-significantly increase the amount of time they spend on the side of the tank where the light shone (Figure 4A)
Guppies non-significantly increase the speed at which they visit the rewarded feeder over the unrewarded feeder after the light shines (Figure 4B)
Guppies non-significantly increase their proportion of visits to the rewarded feeder after the light shines (Figure 4C)

Data are means ± SE. Dashed lines represent mean values for the pre-light baseline of the behaviour being displayed. Squares represent pre-light baselines and circles represent post-light measures. (A) Plot for rewarding side preference in seconds (B) Plot for rewarding feeder latency bias (C) Plot for proportion of visits to the rewarding feeder

Figure 4: Data are means ± SE. Dashed lines represent mean values for the pre-light baseline of the behaviour being displayed. Squares represent pre-light baselines and circles represent post-light measures. (A) Plot for rewarding side preference in seconds (B) Plot for rewarding feeder latency bias (C) Plot for proportion of visits to the rewarding feeder

We see that while the effect of the light turning is not significant for any one preference measure, they all have an effect size that is in the direction consistent with the hypothesis that guppies can learn a food-light association in the automated chamber.

Our sample of guppies has a high amount of variation leading to estimates with wide confidence intervals. Moreover, our sample is relatively small so the decrease in estimate precision due to sampling error is amplified. There could be several reasons for this variation. Given that we did not explicitly quantify performance during training it may be that guppies that did not perform any of the behaviours suggesting learning of the association did not feed as much as guppies which did, leading to a difference in performance on the test trial. If we assume guppies did receive similar reinforcement, there may still be variation due to individual differences in the expression of learned behaviour.

Individual guppies may express learning in different manners. Some may choose to spend more time on the side with the feeder while others may choose to make more visits to the feeder. In this case, the strongest and most common responses are more likely to produce statistically significant estimates as they will have less variance. Latency we might expect to be a particularly noisy measure because it depends on where an individual was in the tank when the light came on. By chance, individuals may have been further away or closer to the feeder when the light came on which could produce additional variation on the latency metric. The low precision on the estimate of the effect of light status on time spent on the rewarding side of the tank may be due to side biases. While non-significant, guppies that had the light shine on the right side of the tank increased their preference for the right side of the tank more than the guppies that had the light shine on the left side of the tank increased their preference for the left side of the tank. Problematically this could be an artefact of sample size—there were nearly double the guppies for which the light shone on the right side of the tank versus the left (11 on the right vs 6 on the left).

Follow up data exploration

Following the results for our a priori hypotheses we conducted some post-hoc explorations into the data which are detailed in the sections below.

Side bias

From Simon: “I’d suggest also calculating time on left - right, and plotting against side illuminated - this will help to illustrate any side bias”.

There is not a statistically clear side bias either way. For the ‘light on the left’ guppies there appears to be a slight bias but this may be an artefact of sample size (the confidence interval around the mean is very wide), there are 6 guppies for which the light shone on the left while there are almost double the guppies for which the light shone on the right (11 guppies). Investigating this in a model reveals no significant effects.

Time bins

From Simon: “One possibility is that an initial preference is wiped out once fish discover no food. So you could look at 1st minute or whatever time period seems sensible”

Behaviour might be structured temporally so we wanted to see whether there were differences in behaviour that were apparent by binning the data into 1 minute bins.

Tools used and References

A complete list of the tools used is produced below:

Package	Version	Reference
broom	0.5.5	David Robinson and Alex Hayes (2020). broom: Convert Statistical Analysis Objects into Tidy Tibbles. R package version 0.5.5. https://CRAN.R-project.org/package=broom
broom.mixed	0.2.6	Ben Bolker and David Robinson (2020). broom.mixed: Tidying Methods for Mixed Models. R package version 0.2.6. https://CRAN.R-project.org/package=broom.mixed
carData	3.0.3	John Fox, Sanford Weisberg and Brad Price (2019). carData: Companion to Applied Regression Data Sets. R package version 3.0-3. https://CRAN.R-project.org/package=carData
cowplot	1.0.0	Claus O. Wilke (2019). cowplot: Streamlined Plot Theme and Plot Annotations for ‘ggplot2’. R package version 1.0.0. https://CRAN.R-project.org/package=cowplot
DHARMa	0.3.3.0	Florian Hartig (2020). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models. R package version 0.3.3.0. http://florianhartig.github.io/DHARMa/
dplyr	1.0.3	Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.3. https://CRAN.R-project.org/package=dplyr
effects	4.1.4	John Fox and Sanford Weisberg (2019). An R Companion to Applied Regression, 3rd Edition. Thousand Oaks, CA http://tinyurl.com/carbook
emmeans	1.5.1	Russell Lenth (2020). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.5.1. https://CRAN.R-project.org/package=emmeans
ggplot2	3.3.3	H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
ggpubr	0.2.5	Alboukadel Kassambara (2020). ggpubr: ‘ggplot2’ Based Publication Ready Plots. R package version 0.2.5. https://CRAN.R-project.org/package=ggpubr
knitr	1.30	Yihui Xie (2020). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.30.
lme4	1.1.21	Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
lmerTest	3.1.1	Kuznetsova A, Brockhoff PB, Christensen RHB (2017). “lmerTest Package:Tests in Linear Mixed Effects Models.” Journal of StatisticalSoftware, 82(13), 1-26. doi: 10.18637/jss.v082.i13 (URL:https://doi.org/10.18637/jss.v082.i13).
magrittr	2.0.1	Stefan Milton Bache and Hadley Wickham (2020). magrittr: A Forward-Pipe Operator for R. R package version 2.0.1. https://CRAN.R-project.org/package=magrittr
Matrix	1.2.18	Douglas Bates and Martin Maechler (2019). Matrix: Sparse and Dense Matrix Classes and Methods. R package version 1.2-18. https://CRAN.R-project.org/package=Matrix
patchwork	1.1.0.9000	Thomas Lin Pedersen (2021). patchwork: The Composer of Plots. https://patchwork.data-imaginist.com, https://github.com/thomasp85/patchwork.
R	3.6.2	R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
report	0.2.0	Makowski, D., Ben-Shachar, M.S., Patil, I. & Lüdecke, D. (2020). Automated reporting as a practical tool to improve reproducibility and methodological best practices adoption. CRAN. Available from https://github.com/easystats/report. doi: .
tidyext	0.3.6	Michael Clark (2021). tidyext: Tidy Extensions for Data Processing. https://m-clark.github.io/tidyext, https://github.com/m-clark/tidyext.
tidyr	1.0.2	Hadley Wickham and Lionel Henry (2020). tidyr: Tidy Messy Data. R package version 1.0.2. https://CRAN.R-project.org/package=tidyr