Sexual dimorphism in trait variability and its eco-evolutionary and statistical implications

SusanneRKZajitschek
FelixZajitschek
RussellBonduriansky
RobertCBrooks
WillCornwell
DanielSFalster
MalgorzataLagisz
JeremyMason
AlistairMSenior
DanielWANoble
ShinichiNakagawa
School of BiologicalEarth and Environmental Sciences University of New South WalesEvolution & Ecology Research Center
School of Biological and EnvironmentalSciences Liverpool United Kingdom;Liverpool John Moores University
Anonymous
CharlesPerkins Centre School of Life and Environmental SciencesUniversity of Sydney
ResearchSchool of Biology Australian National UniversityDivision of Ecology and Evolution

Abstract

Biomedical and clinical sciences are experiencing a renewed interest in the fact that males and females differ in many anatomic, physiological, and behavioural traits. Sex differences in trait variability, however, are yet to receive similar recognition. In medical science, mammalian females are assumed to have higher trait variability due to estrous cycles (the ‘estrus-mediated variability hypothesis’); historically in biomedical research, females have been excluded for this reason. Contrastingly, evolutionary theory and associated data support the ‘greater male variability hypothesis’. Here, we test these competing hypotheses in 218 traits measured in >26,900 mice, using meta-analysis methods. Neither hypothesis could universally explain patterns in trait variability. Sex bias in variability was trait-dependent. While greater male variability was found in morphological traits, females were much more variable in immunological traits. Sex-specific variability has eco-evolutionary ramifications, including sex-dependent responses to climate change, as well as statistical implications including power analysis considering sex difference in variance.

Introduction

Sex differences arise because selection acts on the two sexes differently, especially on traits associated with mating and reproduction (Darwin, 1871). Therefore, sex differences are widespread, a fact which is unsurprising to any evolutionary biologist. However, scientists in many (bio-)medical fields have not necessarily regarded sex as a biological factor of intrinsic interest (Clayton, 2016; Flanagan, 2014; Karp et al., 2017; Klein et al., 2015; Prendergast et al., 2014; Shansky and Woolley, 2016). Therefore, many (bio-)medical studies have only been conducted with male subjects. Consequently, our knowledge is biased. For example, we know far more about drug efficacy in male compared to female subjects, contributing to a poor understanding of how the sexes respond differently to medical interventions (Nowogrodzki, 2017). This gap in knowledge is predicted to lead to overmedication and adverse drug reactions in women (Zucker and Prendergast, 2020). Only recently have (bio-)medical scientists started considering sex differences in their research (Dorris et al., 2015; Ingvorsen et al., 2017; Robinson et al., 2017; Smarr et al., 2017;Ahmad et al., 2017; Foltin and Evans, 2018; Thompson et al., 2018). Indeed, the National Insti- tutes of Health (NIH) have now implemented new guidelines for animal and human research study designs, requiring that sex be included as a biological variable (Clayton, 2016; Clayton and Collins, 2014; NIH, 2015a). When comparing the sexes, biologists generally focus on mean differences in trait values, placing little or no emphasis on sex differences in trait variability (see Figure 1 for a diagram explaining differences in means and variances). Despite this, two hypotheses exist that explain why trait variability might be expected to differ between the sexes. Interestingly, these two hypotheses make opposing predictions.

this stupid figure has to get embedded Fig1

First, the ‘estrus-mediated variability hypothesis’ (Figure 2), which emerged in the (bio-)medical research field, assumes that the female estrous cycle (see e.g. Prendergast et al., 2014; Beery and Zucker, 2011) causes higher variability across traits in female subjects. A wide range of labile traits are presumed to co-vary with physiological changes that are induced by reproductive hormones. High variability is, therefore, expected to be particularly prominent when the stage of the estrous cycle is unknown and unaccounted for. This higher trait variability, resulting from females being at different stages of their estrous cycle, is the main reason for why female research subjects are often excluded from biomedical research trials, especially in the fields of neuroscience, physiology and pharmacology (NIH, 2015a). Female exclusion has traditionally been justified based on the grounds that including females in empirical research leads to a loss of statistical power, or that animals must be sampled across the estrous cycle for one to make valid conclusions, requiring more time and resources.

Fig2

Second, the ‘greater male variability hypothesis’ suggests males exhibit higher trait variability because of two different mechanisms. The first mechanism is based on males being the heteroga- metic sex in mammals. Mammalian females possess two X chromosomes, leading to an ‘averaging’ of trait expression across the genes on each chromosome. In contrast, males exhibit greater variance because expression of genes on a single X chromosome is likely to lead to more extreme trait values (Reinhold and Engqvist, 2013). The second mechanism is based on males being under stronger sex- ual selection (Pomiankowski and Moller, 1995; Cuervo and Møller, 1999; Cuervo and Møller,2001). Empirical evidence supports higher variability of traits that are sexually selected, often har- bouring high genetic variance and being condition-dependent, which makes sense as ‘condition’ as a trait is likely to be based on numerous loci (Rowe and Houle, 1996; Tomkins et al., 2004). Thus, higher genetic and, thus, phenotypic variance resulting from sexual selection is expected to charac- terise sexually selected traits. In mammals, it is likely that both mechanisms are operating concomi- tantly. So far, the ‘greater male variability hypothesis’ has gained some support in the evolutionary and psychological literature (Reinhold and Engqvist, 2013; Lehre et al., 2009). Here, we conduct the first comprehensive test of the greater male variability and estrus-mediated variability hypotheses in mice (Figure 2; Reinhold and Engqvist, 2013; Johnson et al., 2008;Hedges and Nowell, 1995; Itoh and Arnold, 2015; Becker et al., 2016; Beery, 2018), examining sex differences in variance across 218 traits in 26,916 animals. To this end, we carry out a series of meta-analyses in two steps (Figure 3). First, we quantify the natural logarithm of the male to female coefficients of variation, CV, or relative variance (lnCVR) for each cohort (population) of mice, for dif- ferent traits, along with the variability ratio of male to female standard deviations, SD, on the log scale (lnVR, following Nakagawa et al., 2015, see Figure 1). Then, we analyse these effect sizes to quantify sex bias in variance for each trait using meta-analytic methods. To better understand our results, and match them to previously reported sex differences in trait means (Karp et al., 2017), we also quantify and analyse the log response ratio (lnRR). Next, we statistically amalgamate the trait- level results to test our hypotheses and to quantify the degree of sex bias in and across nine functional trait groups (for details on the grouping, see below). Our meta-analytic approach allows easy interpretation and comparison with earlier and future studies. Further, the proposed method using lnCVR (and lnVR) is probably the only practical method to compare variability between two sexes within and across studies (Nakagawa et al., 2015; Senior et al., 2020), as far as we are aware. Also, the use of a ratio (i.e. lnRR, lnVR, lnCVR) between two groups (males and females) naturally controls for different units (e.g. cm, g, ml) as well as for changes in traits over time and space.

Results

Data characteristics and workflow We used a dataset compiled by the International Mouse Phenotyping Consortium (Dickinson et al., 2016) (IMPC, dataset acquired 6/2018). To gain insight into systematic sex differences, we only included data of wildtype-strain adult mice, between 100 and 500 days of age. We removed cases with missing data, and selected measurements that were closest to 100 days of age (young adult) when multiple measurements of the same trait were available. To obtain robust estimates of sex dif- ferences, we only used data on traits that were measured in at least two different institutions (see workflow diagram, Figure 3).

Fig3

Our dataset comprised 218 continuous traits (after initial data cleaning and pre-processing; Figure 3). It contains information from 26,916 mice from nine wildtype strains that were studied across 11 institutions. We combined mouse strain/institution information to create a biological grouping variable (referred to as ‘population’ in Figure 3B; see also Supplementary file 1, Table 1 for details), and the mean and variance of a trait for each population was quantified. We assigned traits according to related procedures into functionally and/or procedurally related trait groups to enhance interpretability (referred to as ‘functional groups’ hereafter; see also Figure 3G). Our nine functional trait groups were: behaviour, morphology, metabolism, physiology, immunology, hematology, heart, hearing and eye (for the rationale of these functional groups and related details, see Methods and Supplementary file 1, Table 3).

packages1

library(pacman)
pacman::p_load(readr, dplyr,metafor, devtools, purrr, tidyverse, tidyr, tibble, kableExtra, robumeta, ggpubr, ggplot2, png, grid, here, knitr, pander, splus2R)
data <- readRDS(here("export", "data_clean.rds")) 
procedures <- read_csv(here("data", "procedures.csv"))

n <- length(unique(data$id))
# Create dataframe to store results
results_alltraits_grouping <- 
    data.frame(tibble(id = 1:n, 
           lnCVR=0, lnCVR_lower=0, lnCVR_upper=0, lnCVR_se=0, lnCVR_I2=0,
            lnVR=0,  lnVR_lower=0,  lnVR_upper=0, lnVR_se=0, lnVR_I2=0,
            lnRR=0,  lnRR_lower=0,  lnRR_upper=0, lnRR_se=0, lnRR_I2=0,
           k=0, trait=0, male_N = 0, female_N=0, min_male_N = 0, max_male_N = 0, mean_male_N = 0,
              min_female_N = 0, max_female_N = 0, mean_female_N = 0))
for (t in 1:n) {
  tryCatch(
    {
      results <- data %>%
                 data_subset_parameterid_individual_by_age(t) %>%
                 calculate_population_stats()                 %>%
                 create_meta_analysis_effect_sizes()          %>%
                 mutate(err = seq_len(n()))
      # lnCVR,  log repsonse-ratio of the coefficient of variance
      cvr <- metafor::rma.mv(yi = effect_size_CVR, V = sample_variance_CVR, 
                             random = list(~ 1 | strain_name, 
                                           ~ 1 | production_center, 
                                           ~ 1 | err), 
                             control = list(optimizer = "optim", 
                                            optmethod = "Nelder-Mead", 
                                            maxit = 1000), 
                             verbose = F, data = results)

# lnVR, comparison of standard deviations
      cv <- metafor::rma.mv(yi = effect_size_VR, V = sample_variance_VR,
                            random = list(~ 1 | strain_name, 
                                          ~ 1 | production_center, 
                                          ~ 1 | err), 
                            control = list(optimizer = "optim", 
                                           optmethod = "Nelder-Mead", 
                                           maxit = 1000), 
                            verbose = F, data = results)
      # for means, lnRR
      means <- metafor::rma.mv(yi = effect_size_ROM, V = sample_variance_ROM, 
                               random = list(~ 1 | strain_name, 
                                             ~ 1 | production_center, 
                                             ~ 1 | err), 
                               control = list(optimizer = "optim", 
                                              optmethod = "Nelder-Mead", 
                                              maxit = 1000), 
                               verbose = F, data = results)
      
      f <- function(x) unlist(x[c("b", "ci.lb", "ci.ub", "se")])
      
      extract_I2 <- function(mod){
                  sigma2 <- mod$sigma2
                       w <- mod$vi
                       k <- mod$k
                  
                  sigmaM <- sum(w * (k-1)) / (sum(w)^2 - sum(w^2))
                  
                    I2_tot <- round(sum(sigma2) / sum(sigma2 + sigmaM), digits = 3)*100
                  return(I2_tot)
      }
      results_alltraits_grouping[t, c(2:17,19:26)] <- c(f(cvr),lnCVR_I2 = extract_I2(cvr), f(cv), lnVR_I2 = extract_I2(cv), f(means), lnRR_I2 = extract_I2(means), k=means$k, male_N = sum(results$male_n_ind), female_N = sum(results$female_n_ind), min_male_N = min(results$male_n_ind), max_male_N = max(results$male_n_ind), mean_male_N = mean(results$male_n_ind),
                                                        min_female_N = min(results$female_n_ind), max_female_N = max(results$female_n_ind), mean_female_N = mean(results$female_n_ind))
      results_alltraits_grouping[t, 18]   <- unique(results$trait)
    },
    error = function(e) {
      cat("ERROR :", t, conditionMessage(e), "\n")
    }
  )
}

results_alltraits_grouping2 <- 
  results_alltraits_grouping %>% 
  
  # Join data with results_alltraits_grouping. We filter duplicated id's to get only one unique row per id (and there is one id per parameter_name)
  left_join(by="id",
             data %>% 
             select(id, parameter_group, procedure = procedure_name, procedure_name, parameter_name) %>%  
             filter(!duplicated(id))) %>%
 
  # Below we add 'procedure' (from the previously loaded 'procedures.csv') as a variable; n <- length(unique(results_alltraits_grouping2$parameter_name))) should equal 232
  left_join(by="procedure", 
            procedures %>% distinct())
# We exclude 14 parameter names for which metafor models didn't converge ("dp t cells", "mzb (cd21/35 high)"), and of parameters that don't harbour enough variation
meta_clean <- results_alltraits_grouping2 %>% 
	            filter(!parameter_name %in% 
	                     c("dp t cells", "mzb (cd21/35 high)", "number of caudal vertebrae", 
	                       "number of cervical vertebrae", "number of digits", "number of lumbar vertebrae", 
	                       "number of pelvic vertebrae", "number of ribs left","number of ribs right", 
	                       "number of signals", "number of thoracic vertebrae", "total number of acquired events in panel a",
	                       "total number of acquired events in panel b", "whole arena permanence"))
# Summary of the cleaned data for 218 traits
  sd(meta_clean$k)
  range(meta_clean$male_N)
  range(meta_clean$female_N)
  range(meta_clean$lnCVR_I2)
  range(meta_clean$lnVR_I2)
  range(meta_clean$lnRR_I2)
  range(meta_clean$k)
  mean(meta_clean$mean_male_N)
  range(meta_clean$min_male_N)
  median(meta_clean$mean_male_N)
  sd(meta_clean$mean_male_N)
  sd(meta_clean$mean_male_N) / sqrt(length(meta_clean$mean_male_N))
  mean(meta_clean$mean_female_N)
  median(meta_clean$mean_female_N)
  range(meta_clean$min_female_N)
  sd(meta_clean$mean_female_N)
  sd(meta_clean$mean_female_N) / sqrt(length(meta_clean$mean_female_N))
  
  meta1_sub <- meta_clean %>%
  # Add summary of number of parameter names in each parameter group
  group_by(parameter_group) %>%
  mutate(par_group_size = length(unique(parameter_name)), 
         sampleSize = as.numeric(k)) %>% 
  ungroup() %>% 
  # Create subsets with > 1 count (par_group_size > 1)
  filter(par_group_size > 1) # 90 observations
  
  # Create summary of number of parameter names in each parameter group, and merge back together
meta1b <-
  meta1 %>%
  group_by(parameter_group) %>% 
  summarize(par_group_size = length(unique(parameter_name, na.rm = TRUE)))
meta1$par_group_size <- meta1b$par_group_size[match(meta1$parameter_group, meta1b$parameter_group)]
# Create subsets with > 1 count (par_group_size > 1) 
           meta1_sub <- subset(meta1,par_group_size >1) # 90 observations   
meta1_sub$sampleSize <- as.numeric(meta1_sub$k)
# Nesting and meta-analyses on correlated traits, using robumeta
n_count <- meta1_sub %>%
           group_by(parameter_group) %>%
           mutate(raw_N = sum(sampleSize)) %>%
           nest() %>%
           ungroup()
model_count <- n_count %>%
  mutate(
     model_lnRR = map(data, ~ robu(.x$lnRR ~ 1,  data = .x, studynum = .x$id, modelweights = c("CORR"), rho = 0.8, small = TRUE, var.eff.size = (.x$lnRR_se)^2)),
     model_lnVR = map(data, ~ robu(.x$lnVR ~ 1,  data = .x, studynum = .x$id, modelweights = c("CORR"), rho = 0.8, small = TRUE, var.eff.size = (.x$lnVR_se)^2)),
    model_lnCVR = map(data, ~ robu(.x$lnCVR ~ 1, data = .x, studynum = .x$id, modelweights = c("CORR"), rho = 0.8, small = TRUE, var.eff.size = (.x$lnCVR_se)^2))
  )
#### Extracting and save parameter estimates
# Here we apply an additional function to collect the outcomes of the 'mini-meta-analysis' that has condensed our non-independent traits. Values from our second-order meta-analysis using robu-meta are then extracted
count_fun <- function(mod_sub) {
  return(c(mod_sub$reg_table$b.r, mod_sub$reg_table$CI.L, mod_sub$reg_table$CI.U, mod_sub$reg_table$SE))
} # estimate, lower ci, upper ci, SE

# Extraction of values created during meta-analyses using robumeta
robusub_RR <- model_count %>%
              transmute(parameter_group, 
                        estimatelnRR = map(model_lnRR, count_fun)) %>%
              mutate(r = map(estimatelnRR, 
                             ~ data.frame(t(.)))) %>%
              unnest(r) %>%
              select(-estimatelnRR) %>%
              purrr::set_names(c("parameter_group", "lnRR", "lnRR_lower", "lnRR_upper", "lnRR_se"))
robusub_CVR <- model_count %>%
               transmute(parameter_group, 
                         estimatelnCVR = map(model_lnCVR, count_fun)) %>%
               mutate(r = map(estimatelnCVR, 
                              ~ data.frame(t(.)))) %>%
               unnest(r) %>%
               select(-estimatelnCVR) %>%
               purrr::set_names(c("parameter_group", "lnCVR", "lnCVR_lower", "lnCVR_upper", "lnCVR_se"))
robusub_VR <- model_count %>%
              transmute(parameter_group, 
                        estimatelnVR = map(model_lnVR, count_fun)) %>%
              mutate(r = map(estimatelnVR, 
                             ~ data.frame(t(.)))) %>%
              unnest(r) %>%
              select(-estimatelnVR) %>%
              purrr::set_names(c("parameter_group", "lnVR", "lnVR_lower", "lnVR_upper", "lnVR_se"))
robu_all <- full_join(robusub_CVR, robusub_VR) %>% full_join(., robusub_RR)
#### Combining data 
# Merge the two data sets (the new [robu_all] and the initial [uncorrelated sub-traits with count = 1]) 
meta_all <- meta1 %>%
            filter(par_group_size == 1) %>%
            as_tibble()
# Step 1:  Columns are matched by name (in our case, 'parameter_group'), and any missing columns will be filled with NA
  combinedmeta <- bind_rows(robu_all, meta_all)
# glimpse(combinedmeta)
# Steps 2&3: Add information about number of traits in a parameter group, procedure, and grouping term
              metacombo <- combinedmeta
       metacombo$counts <- meta1$par_group_size[match(metacombo$parameter_group, meta1$parameter_group)] 
   metacombo$procedure2 <- meta1$procedure[match(metacombo$parameter_group, meta1$parameter_group)]
metacombo$GroupingTerm2 <- meta1$GroupingTerm[match(metacombo$parameter_group, meta1$parameter_group)]
# Clean-up, reorder, and rename 
metacombo <- metacombo[c("parameter_group", "counts","procedure2","GroupingTerm2", "lnCVR","lnCVR_lower","lnCVR_upper","lnCVR_se","lnVR","lnVR_lower","lnVR_upper","lnVR_se","lnRR","lnRR_lower","lnRR_upper","lnRR_se")] 
   names(metacombo)[names(metacombo)=="procedure2"] <- "procedure" 
names(metacombo)[names(metacombo)=="GroupingTerm2"] <- "GroupingTerm" 

metacombo_final <- metacombo %>%
                   group_by(GroupingTerm) %>%
                   nest()

metacombo_final <- metacombo_final %>% 
                   mutate(para_per_GroupingTerm = map_dbl(data, nrow))
# For all grouping terms
metacombo_final_all <- metacombo %>%
                       nest(data = everything())
# Final random effects meta-analyses within grouping terms, with SE of the estimate
overall1 <- metacombo_final %>%
  mutate(
    model_lnCVR = map(data, ~ metafor::rma.uni(
                      yi = .x$lnCVR, sei = (.x$lnCVR_upper - .x$lnCVR_lower) / (2 * 1.96),
                      control = list(optimizer = "optim", 
                                     optmethod = "Nelder-Mead", 
                                     maxit = 1000), 
                      verbose = F)),
    model_lnVR = map(data, ~ metafor::rma.uni(
                    yi = .x$lnVR, sei = (.x$lnVR_upper - .x$lnVR_lower) / (2 * 1.96),
                    control = list(optimizer = "optim", 
                                   optmethod = "Nelder-Mead", 
                                   maxit = 1000), 
                    verbose = F)),
    model_lnRR = map(data, ~ metafor::rma.uni(
                      yi = .x$lnRR, sei = (.x$lnRR_upper - .x$lnRR_lower) / (2 * 1.96),
                      control = list(optimizer = "optim", 
                                     optmethod = "Nelder-Mead", 
                                     maxit = 1000), 
                      verbose = F)))
# Final random effects meta-analyses ACROSS grouping terms, with SE of the estimate
overall_all1 <- metacombo_final_all %>%
  mutate(
    model_lnCVR = map(data, ~ metafor::rma.uni(
                      yi = .x$lnCVR, sei = (.x$lnCVR_upper - .x$lnCVR_lower) / (2 * 1.96),
                      control = list(optimizer = "optim", 
                                     optmethod = "Nelder-Mead", 
                                     maxit = 1000), 
                      verbose = F)),
    model_lnVR = map(data, ~ metafor::rma.uni(
                    yi = .x$lnVR, sei = (.x$lnVR_upper - .x$lnVR_lower) / (2 * 1.96),
                    control = list(optimizer = "optim", 
                                   optmethod = "Nelder-Mead", 
                                   maxit = 1000), 
                    verbose = F)),
    model_lnRR = map(data, ~ metafor::rma.uni(
                    yi = .x$lnRR, sei = (.x$lnRR_upper - .x$lnRR_lower) / (2 * 1.96),
                    control = list(optimizer = "optim", 
                                   optmethod = "Nelder-Mead", 
                                   maxit = 1000), 
                    verbose = F)))
                    
  # Function will take the averall results and extract the relevant trait of interest
extract_trait <- function(data, trait){
  tmp <- data %>% 
        filter(., GroupingTerm == trait) %>% 
        mutate(      lnCVR = .[[4]][[1]]$b, 
               lnCVR_lower = .[[4]][[1]]$ci.lb, 
               lnCVR_upper = .[[4]][[1]]$ci.ub, 
                  lnCVR_se = .[[4]][[1]]$se, 
                  lnCVR_I2 = .[[4]][[1]]$I2,
                      lnVR = .[[5]][[1]]$b,  
                lnVR_lower = .[[5]][[1]]$ci.lb,  
                lnVR_upper = .[[5]][[1]]$ci.ub,  
                   lnVR_se = .[[5]][[1]]$se,  
                   lnVR_I2 = .[[5]][[1]]$I2,
                      lnRR = .[[6]][[1]]$b,  
                lnRR_lower = .[[6]][[1]]$ci.lb,  
                lnRR_upper = .[[6]][[1]]$ci.ub,  
                   lnRR_se = .[[6]][[1]]$se,  
                   lnRR_I2 = .[[6]][[1]]$I2)  %>%
        select(., GroupingTerm, lnCVR:lnRR_I2)
  
  return(tmp)
}
All <- overall_all1 %>% 
       mutate(      lnCVR = .[[2]][[1]]$b, 
              lnCVR_lower = .[[2]][[1]]$ci.lb, 
              lnCVR_upper = .[[2]][[1]]$ci.ub, 
                 lnCVR_se = .[[2]][[1]]$se, 
                 lnCVR_I2 = .[[2]][[1]]$I2,
                     lnVR = .[[3]][[1]]$b,  
               lnVR_lower = .[[3]][[1]]$ci.lb,  
               lnVR_upper = .[[3]][[1]]$ci.ub,  
                  lnVR_se = .[[3]][[1]]$se,  
                  lnVR_I2 = .[[3]][[1]]$I2,
                     lnRR = .[[4]][[1]]$b,  
               lnRR_lower = .[[4]][[1]]$ci.lb,  
               lnRR_upper = .[[4]][[1]]$ci.ub,  
                  lnRR_se = .[[4]][[1]]$se,  
                  lnRR_I2 = .[[4]][[1]]$I2,)  %>%
       select(., lnCVR:lnRR_I2)
       
All <- All %>% mutate(GroupingTerm = "All")

overall2 <- bind_rows(extract_trait(overall1, "Behaviour"), 
                      extract_trait(overall1, "Morphology"), 
                      extract_trait(overall1, "Metabolism"), 
                      extract_trait(overall1, "Physiology"), 
                      extract_trait(overall1, "Immunology"), 
                      extract_trait(overall1, "Hematology"), 
                      extract_trait(overall1, "Heart"), 
                      extract_trait(overall1, "Hearing"), 
                      extract_trait(overall1, "Eye"), 
                      All) 

meta_clean$GroupingTerm <- factor(meta_clean$GroupingTerm, 
                                  levels = c("Behaviour", "Morphology", "Metabolism", "Physiology", "Immunology", "Hematology", "Heart", "Hearing", "Eye"))
meta_clean$GroupingTerm <- factor(meta_clean$GroupingTerm, 
                                  rev(levels(meta_clean$GroupingTerm)))
#Preparing data for all traits
meta.plot2.all <- meta_clean %>%
                  select(lnCVR, lnVR, lnRR, GroupingTerm) %>%
                  arrange(GroupingTerm)
#lnVR has been removed here and in the steps below, as this is only included in the supplemental figure
      meta.plot2.all.b <- gather(meta.plot2.all, trait, value, c(lnCVR, lnRR)) 
meta.plot2.all.b$trait <- factor(meta.plot2.all.b$trait, levels = c("lnCVR", "lnRR")) 
      meta.plot2.all.c <- meta.plot2.all.b %>%
                          group_by_at(vars(trait, GroupingTerm)) %>%
                          summarise(
                            malebias = sum(value > 0), femalebias = sum(value <= 0), total = malebias + femalebias,
                            malepercent = malebias * 100 / total, femalepercent = femalebias * 100 / total
                          )
meta.plot2.all.c$label <- "All traits"
# Re-structure to create stacked bar plots
meta.plot2.all.d <- as.data.frame(meta.plot2.all.c)
meta.plot2.all.e <- gather(meta.plot2.all.d, 
                           key = sex, 
                           value = percent, 
                           malepercent:femalepercent, 
                           factor_key = TRUE)
# Create new sample size variable
meta.plot2.all.e$samplesize <- with(meta.plot2.all.e, 
                                    ifelse(sex == "malepercent", 
                                           malebias, 
                                           femalebias))
# Add summary row ('All') and re-arrange rows into correct order for plotting (warnings about coercing 'id' into character vector are ok)
meta.plot2.all.f <- meta.plot2.all.e %>% 
                    group_by(trait, sex) %>% 
	                  summarise(GroupingTerm = "All", 
	                            malebias = sum(malebias),
	                            femalebias = sum(femalebias),
	                            total = malebias + femalebias, 
	                            label = "All traits", 
	                            samplesize = sum(samplesize)) %>%
	                  mutate(percent = ifelse(sex == "femalepercent", femalebias*100/(malebias+femalebias), malebias*100/(malebias+femalebias))) %>%
	                  bind_rows(meta.plot2.all.e, .) %>%
	                  mutate(rownumber = row_number()) %>%
	                  .[c(37, 1:9, 39, 10:18, 38, 19:27, 40, 28:36), ] 
  #line references in previous code line corresponding to: 
  #'lnCVR(male(All)), lnCVR(male('single grouping terms'), lnRR(male(All)), lnRR(male('single grouping terms')),
  #lnCVR(female(All)), lnCVR(female('single grouping terms'), lnRR(female(All)), lnRR(female('single grouping terms'))'
meta.plot2.all.f$GroupingTerm <- factor(meta.plot2.all.f$GroupingTerm,
                                        levels = c("Behaviour", "Morphology", "Metabolism", "Physiology", "Immunology", "Hematology", "Heart", "Hearing", "Eye", "All")) 
meta.plot2.all.f$GroupingTerm <- factor(meta.plot2.all.f$GroupingTerm, 
                                        rev(levels(meta.plot2.all.f$GroupingTerm)))
malebias_Fig2_alltraits <-
    ggplot(meta.plot2.all.f) +
    aes(x = GroupingTerm, y = percent, fill = sex) +
    ylab("Percentage") +
    geom_col() +
    geom_hline(yintercept = 50, linetype = "dashed", color = "gray40") +
    geom_text(
      data = subset(meta.plot2.all.f, samplesize != 0), aes(label = samplesize), position = position_stack(vjust = .5),
      color = "white", size = 3.5
    ) +
    facet_grid(
      cols = vars(trait), rows = vars(label), labeller = label_wrap_gen(width = 18),
      scales = "free", space = "free"
    ) +  
    scale_fill_brewer(palette = "Set2") +
    theme_bw(base_size = 18) +
    theme(
      strip.text.y = element_text(angle = 270, size = 10, margin = margin(t = 15, r = 15, b = 15, l = 15)),
      strip.text.x = element_text(size = 12),
      strip.background = element_rect(colour = NULL, linetype = "blank", fill = "gray90"),
      text = element_text(size = 14),
      panel.spacing = unit(0.5, "lines"),
      panel.border = element_blank(),
      axis.line = element_line(),
      panel.grid.major.x = element_line(linetype = "solid", colour = "gray95"),
      panel.grid.major.y = element_line(linetype = "solid", color = "gray95"),
      panel.grid.minor.y = element_blank(),
      panel.grid.minor.x = element_blank(),
      legend.position= "none",
      #axis.title.x = Percentage,
      axis.title.y = element_blank()
    ) +
    coord_flip()
    overall3 <- gather(overall2, parameter, value, c(lnCVR, lnRR), factor_key = TRUE) 
lnCVR.ci <- overall3 %>%
           filter(parameter == "lnCVR") %>%
           mutate(ci.low = lnCVR_lower, ci.high = lnCVR_upper)
lnVR.ci <- overall3 %>%
          filter(parameter == "lnVR") %>%
          mutate(ci.low = lnVR_lower, ci.high = lnVR_upper)
lnRR.ci <- overall3 %>%
          filter(parameter == "lnRR") %>%
          mutate(ci.low = lnRR_lower, ci.high = lnRR_upper)
overall4 <- bind_rows(lnCVR.ci, lnRR.ci) %>% select(GroupingTerm, parameter, value, ci.low, ci.high) 
# Re-order grouping terms
overall4$GroupingTerm <- factor(overall4$GroupingTerm, 
                                levels = c("Behaviour", "Morphology", "Metabolism", "Physiology", "Immunology", "Hematology", "Heart", "Hearing", "Eye", "All"))
overall4$GroupingTerm <- factor(overall4$GroupingTerm, 
                                rev(levels(overall4$GroupingTerm)))
overall4$label <- "All traits"
####  PLOT 4B
#adding male / female symbols for Figure 4B
# additional packages for intergating male / female symbols in plot 4
library(png)
library(grid)
  male <- readPNG(here("images", "MaleAqua.png"))
female <- readPNG(here("images", "FemaleSalmon.png"))
Metameta_Fig3_alltraits <- overall4 %>%
  ggplot(aes(y = GroupingTerm, x = value)) +
  geom_errorbarh(aes(
    xmin = ci.low,
    xmax = ci.high
  ),
  height = 0.1, show.legend = FALSE
  ) +
  geom_point(aes(shape = parameter),
    fill = "black",
    color = "black", size = 2.2,
    show.legend = FALSE
  ) +
  scale_x_continuous(
    limits = c(-0.24, 0.25),
    breaks = c(-0.2, -0.1, 0, 0.1, 0.2),
    name = "Effect size"
  ) +
  geom_vline(
    xintercept = 0,
    color = "black",
    linetype = "dashed"
  ) +
  facet_grid(
    cols = vars(parameter), rows = vars(label),
    labeller = label_wrap_gen(width = 23),
    scales = "free",
    space = "free"
  ) +
  theme_bw() +
  theme(
    strip.text.y = element_text(angle = 270, size = 10, margin = margin(t = 15, r = 15, b = 15, l = 15)),
    strip.text.x = element_text(size = 12),
    strip.background = element_rect(colour = NULL, linetype = "blank", fill = "gray90"),
    text = element_text(size = 14),
    panel.spacing = unit(0.5, "lines"),
    panel.border = element_blank(),
    axis.line = element_line(),
    panel.grid.major.x = element_line(linetype = "solid", colour = "gray95"),
    panel.grid.major.y = element_line(linetype = "solid", color = "gray95"),
    panel.grid.minor.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.title = element_blank(),
    axis.title.x = element_text(hjust = 0.5, size = 14),
    axis.title.y = element_blank()) +
  # addition of male & female symols
  annotation_custom(rasterGrob(female), xmin = - 0.2, xmax = -0.1, ymin = 2.3, ymax = 4) +
  annotation_custom(rasterGrob(male), xmin = 0.1, xmax = 0.2, ymin = 2.3, ymax = 4)

blah blah

Fig3

Fig4 <- ggarrange(malebias_Fig2_alltraits, Metameta_Fig3_alltraits,  nrow = 2, align = "v", heights = c(10, 9), labels = c("A", "B"))
Fig4

Panel A shows the numbers of traits across functional groups that are either male-biased (blue-green) or female-biased (orange-red)

Testing the two hypotheses

We found that some means and variabilities of traits were biased towards males (i.e. ‘male-biased’, hereafter; turquoise shaded traits, Figure 4), but others towards females (i.e. ‘female-biased’, here- after; orange shading, Figure 4) within all functional groups. These sex-specific biases occur in mean trait sizes and also in our measures of trait variability. There were strong positive relationships between mean and variance across traits (r > 0.94 on the log scale; Figure 1—figure supplement 1), and therefore, we report the results of lnCVR, which controls for differences in means, in the main text. Results on lnVR are presented as figure supplements (Figure 4—figure supplements 1 and 2). There was no consistent pattern in which sex has more variability (lnCVR) in the examined traits (left panel in Figure 4A). Our meta-analytic results also did not support a consistent pattern of either higher male variability or higher female variability (see Figure 4B, left panel: ‘All’ indicates that across all traits and functional groups, there was no significant sex bias in variances; lnCVR = 0.005, 95% confidence interval, 95% CI = -0.009 to 0.018). However, there was high heterogeneity among traits (I2 = 76.5%, Supplementary file 1, Table 4 and see also Table 5), indicating sex differ- ences in variability are trait-dependent, corroborating our general observation that variability in some traits was male-biased but others female-biased (Figure 4A). As expected, specific functional trait groups showed significant sex-specific bias in variability (Figure 4B). The variability among traits within a functional group was lower than that of all the traits combined (Supplementary file 1, Table 4). For example, males exhibited an 8.05% increase in CV relative to females for morphological traits (lnCVR = 0.077; CI = 0.041 to 0.113, I2 = 67.3%), but CV was female-biased for immunological traits (6.59% higher in females, lnCVR = -0.068, CI = -0.098 to 0.038, I2 = 40.8%) and eye morphology (7.85% higher in females, lnCVR = -0.081, CI = -0.147 to (-0.016), I2 = 49.8%). The pattern was similar for overall sexual dimorphism in mean trait values (here, a slight male bias is indicated by larger ‘turquoise’ than ‘orange’ areas; Figure 4B, right and Figure 4B, lnRR: ‘All’, lnRR = 0.012, CI = -0.006 to 0.31). Trait means (lnRR) were 7% larger for males (lnRR = 0.067; CI =0.007 to 0.128) in morphological traits and 15.3% larger in males for metabolic traits (lnRR = 0.142; CI = 0.036 to 0.248). In contrast, females had 5.59% (lnRR = 0.057, CI = -0.107 to (-0.007)) larger means than those of males for immunological traits. We note that these meta-analytic estimates were accompanied by very large between-trait heterogeneity values (morphology I2 = 99.7%, metabolism I2 = 99.4%, immunology I2 = 96.2; see Supplementary file 1, Table 4), indicating that even within the same functional groups, the degree and direction of sex bias in the mean was not consistent among traits.

Discussion

We tested competing predictions from two hypotheses explaining why sex biases in trait variability exist. Neither the ‘greater male variability’ hypothesis nor the ‘estrus-mediated variability’ hypothesis explain the observed patterns in sex-biased trait variation on their own. Therefore, our results add further empirical weight to calls that question the basis for the routine exclusion of one sex in bio- medical research based on the estrus-mediated variability hypothesis (Flanagan, 2014; Klein et al., 2015; Prendergast et al., 2014; Shansky and Woolley, 2016; Becker et al., 2016). It is important to know that for each trait we estimated the mean effect size (i.e. lnCVR) over strains and locations. As such, our results may not necessarily apply to every group of mice, which may or may not result in stronger support for either of the two hypotheses. Greater male variability vs. estrus-mediated variability? Evolutionary biologists commonly expect greater variability in the heterogametic sex than the homo- gametic sex. In mammals, males are heterogametic, and hence are expected to exhibit higher trait variability compared to females, which is also consistent with an expectation from sexual selection theory (Reinhold and Engqvist, 2013). Our results provide only partial support for the greater male variability hypothesis, because the expected pattern only manifested for morphological traits (see Figures 4 and 5). This result corroborates a previous analysis across animals, which found that the heterogametic sex was more variable in body size (Reinhold and Engqvist, 2013). However, our data do not support the conclusion that higher variability in males occurs across all traits, including for many other morphological traits. The estrus-mediated variability hypothesis was, at least until recently (Prendergast et al., 2014; Smarr et al., 2017), regularly used as a rationale for including only male subjects in many biomedical studies. So far, we know very little about the relationship between hormonal fluctuations and general trait variability within and among female subjects. Our results are consistent with the estrus-medi- ated variability hypothesis for immunological traits only. Immune responses can strongly depend on sex hormones (Zuk and McKean, 1996; Grossman, 1989), which may explain higher female variabil- ity in these traits. However, if estrus status affects traits through variation in hormone levels, we would expect to also find higher female variability in physiological and hematological traits. This was not the case in our dataset. Interestingly, however, eye morphology (structural traits, which should fluctuate little across the estrous cycle) also appeared to be more variable in females than males, but little is known about sex differences in ocular traits in general (Wagner et al., 2008; Shaqiri et al., 2018). Overall, we find no consistent support for the female estrus-mediated variability hypothesis. In line with our findings, recent studies have refuted the prediction of higher female variability (Prendergast et al., 2014; Smarr et al., 2017; Beery and Zucker, 2011; Becker et al., 2016; Beery, 2018). For example, several rodent studies have found that males are more variable than females (Prendergast et al., 2014; Smarr et al., 2017; Becker et al., 2016; Beery, 2018; Fritz et al., 2017; Mogil and Chanda, 2005). Further studies should investigate whether higher female variability in immunological traits is indeed due to the estrous cycle, or generally because of greater between-individual variation (Figure 2). In general, we found many traits to be sexually dimorphic (Figure 5) in accordance with the previ- ous study, which used the same database (Karp et al., 2017). Although the original study also pro- vided estimates for sex differences in traits both with and without controlling for weight (we did not control for weight; Nakagawa et al., 2017). More specifically, males are larger than females, while females have higher immunological parameters (see Figure 5). Notably, the most sexually dimorphic trait means also show the greatest differences in trait variance (Figures 4 and 5). Indeed, theory pre- dicts that sexually selected traits (e.g. larger body size for males due to male-male competition) are likely more variable, as these traits are often condition-dependent (Rowe and Houle, 1996). There- fore, this sex difference in variability could be more pronounced under natural conditions compared to laboratory settings. This relationship may explain why male-biased morphological traits are larger and more variable.

Eco-evolutionary implications

We have used lnCVR values to compare phenotypic variability (CV) between the sexes. When lnCVR is used for fitness-related traits, it can signify sex differences in the ‘opportunity for selection’ between females and males (Rowe and Houle, 1996). If we assume that phenotypic variation (i.e. variability in traits) has a heritable basis, then large ratios of lnCVR may indicate differences in the evolutionary potential of each sex to respond to selection, at least in the short term (Hansen and Houle, 2008). For example, more variable morphological traits of males could potentially provide them with better capacity than females to adapt morphologically to a changing climate. We note, however, that in our study, lnCVR reflects sex differences in trait variability within strains, such that the variability differences we observe between the sexes may be partially the result of phenotypic plasticity. Demographic parameters, such as age-dependent mortality rate (Lemaıˆtre et al., 2020) can often be different for each sex. For example, a study on European sparrowhawks found that variability in mortality was higher in females compared to males (Colchero et al., 2017). In this species, sex-spe- cific variation affects age-dependent mortality and results in higher average female life expectancy. Therefore, population dynamic models, which make predictions about how populations change in their size over time, should take sex differences in variability into account to produce more accurate predictions (Caswell and Weeks, 1986; Lindstro ̈m and Kokko, 1998). In our rapidly changing world, better predictions on population dynamics are vital for understanding whether climate change is likely to result in population extinction and lead to further biodiversity loss.

Statistical and practical implications

It is now mandatory to include both sexes in biomedical experiments and clinical trials funded by the NIH, unless there exists strong justification against the inclusion of both sexes (NIH, 2015a; NIH, 2015b). In order to conduct meaningful research and make sound clinical recommendations for both male and female patients, it is necessary to understand both how trait means and variances dif- fer between the sexes. If one sex is systematically more variable in a trait of interest than the other, then experiments should be designed to accommodate relative differences in statistical power between the sexes (which has not been considered before, see Flanagan, 2014; Klein et al., 2015; Prendergast et al., 2014; Shansky and Woolley, 2016). For example, female immunological traits are generally more variable (i.e. having higher CV and SD). Therefore, in an experiment measuring immunological traits, we would need to include a larger sample (N) of females than males (Nfemale > Nmale; Ntotal = Nfemale + Nmale) to achieve the same power as when the experiment only includes males (Ntotal* = 2Nmale). In other words, in an experiment with both sexes we would need a larger sample size than the same experiment with males only (Ntotal > Ntotal*). To help researchers adjust their sex-specific sample size to achieve optimal statistical power, we provide an online tool (ShinyApp; https://bit.ly/sex-difference). This tool may serve as a starting point for checking baseline variability for each sex in mice. The sex bias (indicated by the % differ- ence between the sexes) is provided for separate traits, procedures, and functional groups. These meta-analytic results are based on our analyses of more than 2 million rodent data points, from 26,916 individual mice. We note, however, that variability in a trait measured in untreated individuals maintained under carefully standardized environmental conditions, as reported here, may not directly translate into the same variability when measured in experimentally treated individuals, or individuals exposed to a range of environments (i.e. natural populations or human cohorts). Further, these estimates are overall mean differences across strains and locations. Therefore, these may not be particularly informative if one’s experiment only includes one specific strain. Nonetheless, we point out that our estimates may be useful in the light of a recent recommendation of using ‘hetero- genization’ where many different strains are systematically included (i.e. randomized complete block design) to increase the robustness of experimental results (Voelkl et al., 2020). However, note that an experiment with heterogenization might only include a few strains with several animals per strain. Even in such a case, using just a few strains, our tool could provide potentially useful benchmarks. Incidentally, heterogenization would be key to making one’s experimental outcome more generaliz- able (Webster and Rutz, 2020). Importantly, when two groups (e.g. males and females) show differences in variability, we violate homogeneity of variance or homoscedasticity assumptions. Such a violation is detrimental because it leads to a higher Type I error rate. Therefore, we should consider incorporating heteroscedasticity (different variances) explicitly or using robust estimators of variance (also known as ‘the sandwich variance estimator’) to prevent an inflated Type I error rate (Cleasby and Nakagawa, 2011), espe- cially when we compare traits between the sexes.

Conclusion

We have shown that sex biases in variability occur in many mouse traits, but that the directions of those biases differ between traits. Neither the ‘greater male variability’ nor the ‘estrus-mediated var- iability’ hypothesis provides a general explanation for sex differences in trait variability. Instead, we have found that the direction of the sex bias varies across traits and among trait types (Figures 4 and 5). Our findings have important ecological and evolutionary ramifications. If the differences in variability correspond to the potential of each sex to respond to changes in specific environments, this sex difference needs to be incorporated into demographic and population dynamic modelling. Moreover, in the (bio-)medical field, our results should inform decisions during study design by pro- viding more rigorous power analyses that allow researchers to incorporate sex-specific differences for sample size. We believe that taking sex differences in trait variability into account will help avoid misleading conclusions and provide new insights into sex differences across many areas of biological and bio-medical research. Ultimately, such considerations will not only better our knowledge, but also close the current gaps in our biased knowledge (Tannenbaum et al., 2019).

Materials and methods

Data selection and process

The IMPC (International Mouse Phenotyping Consortium) provides a comprehensive catalogue of mammalian gene function for investigating the genetics of health and disease, by systematically col- lecting phenotypes of knock-out and wildtype mice. To investigate differences in trait variability between the sexes, we only considered the data for wildtype control mice. We retrieved the dataset from the IMPC server in June 2018 and filtered it to contain non-categorical traits for wildtype mice. The initial dataset comprised over 2,500,000 data points for 340 traits. In cases where multiple meas- urements were taken over time, data cleaning started with selecting single measurements for each individual and trait. In these cases, we selected the measurement closest to ‘100 days of age’. All data are from unstaged females (with no information about the stage of their estrous cycle). We excluded data for juvenile and unsexed mice (Figure 3A; this dataset and scripts can be found on https://rpubs.com/SusZaj/ESF; https://bit.ly/code-mice-sex-diff; raw data: https://doi.org/10.5281/ zenodo.3759701).

Grouping and effect-size calculation

We created a grouping variable called ‘population’ (Figure 3B). A population comprised a group of individuals belonging to a distinct wildtype strain maintained at one particular location (institution); populations were identified for every trait of interest. Our data were derived from 11 different loca- tions/institutions, and a given location/institution could provide data on multiple populations (see Supplementary file 1, Table 1 for details on numbers of strains and institutions). We included only populations that contained data points for at least six individuals, and which had information for members of both sexes; further, populations for a particular trait had to come from at least two insti- tutions to be eligible for inclusion. After this selection process, the dataset contained 2,300,000 data points across 232 traits. Overall, we meta-analysed traits with between 2–18 effect sizes (mean = 9.09 effects, SD = 4.47). However, each meta-analysis contained a total number of individ- ual mice that ranged from 83/91 to 13467/13449 (males/females). While a minimum of N = 6 mice were used to create effect sizes for any given group (male or female), in reality samples sizes of male/female groups were much larger (males: mean = 396.66 (SD = 238.23), median = 465.56;females: mean = 407.35 (SD = 240.31), median = 543.89). We used the function escalc in the R pack- age, metafor (Viechtbauer, 2010) to obtain lnCVR, lnVR and lnRR and their corresponding sampling variance for each trait for each population; we worked in the R environment for data cleaning, proc- essing and analyses (R Development Core Team, 2017, version 3.6.0; for the versions of all the soft- ware packages used for this article and all the details and code for the statistical analyses, see the Source code 1 and repositories). As mentioned above, the use of ratio-based effect sizes, such as lnCVR, lnVR and lnRR, controls for baseline changes over time and space, assuming that these changes affect males and females similarly. However, we acknowledge that we could not test this assumption.

Meta-analyses: overview

We conducted meta-analyses at two different levels (Figure 3C–J). First, we conducted a meta-anal- ysis for each trait for all three effect-size types (lnRR, lnVR and lnCVR), calculated at the ‘population’ level (i.e. using population as a unit of analysis). Second, we statistically amalgamated overall effect sizes estimated at each trait (i.e. overall trait means as a unit of analysis) after accounting for depen- dence among traits. In other words, we conducted second-order meta-analyses (Nakagawa et al., 2019). We used the second-order meta-analyses for three different purposes: (A) estimating overall sex biases in variance (lnCVR and lnVR) and mean (lnRR) in the nine functional groups (for details, see below) and in all these groups combined (the overall estimates); (B) visualizing heterogeneities across populations for the three types of effect size in the nine functional trait groups, which comple- mented the first set of analyses (Figure 3I, Table 6 in Supplementary file 1); and (C) when traits were found to be significantly sex-biased, grouping such traits into either male-biased and female- biased traits, and then, estimating overall magnitudes of sex bias for both sexes again for the nine functional trait groups. Only the first second-order meta-analysis (A) directly related to the testing of our hypotheses, results of B and C are found in Supplementary file 1 and figures and reported in our freely accessible code.

Meta-analyses: population as an analysis unit

To obtain degree of sex bias for each trait mean and variance (Figure 3C), we used the function rma.mv in the R package metafor (Viechtbauer, 2010) by fitting the following multilevel meta-ana- lytic model, an extension of random-effects models (sensu Nakagawa and Santos, 2012): ESi ~ 1 + (1 | Strainj) + (1 | Locationk) + (1 | Uniti) + Errori, where ‘ESi’ is the ith effect size (i.e. lnCVR, lnVR and lnRR) for each of 232 traits, the ‘1’ is the overall intercept (other ‘1’s are random intercepts for the following random effects), ‘Strainj’ is a random effect for the jth strain of mice (among nine strains), ‘Locationk’ is a random effect for the kth location (among 11 institutions), ‘Uniti’ is a residual (or effect-size level or ‘population-level’ random effect) for the ith effect size, ‘Errori’ is a random effect of the known sampling error for the ith effect size. Given the model above, meta-ana- lytic results had two components: (1) overall means with standard errors (95% confidence intervals), and (2) total heterogeneity (the sum of the three variance components, which is estimated for the random effects). Note that overall means indicate average (marginalised) effect sizes over different strains and locations, and total heterogeneities reflect variation around overall means due to differ- ent strains and locations. We excluded traits which did not carry useful information for this study (i.e. fixed traits, such as number of vertebrae, digits, ribs and other traits that were not variable across wildtype mice; note that this may be different for knock-down mutant strains) or where the meta-analytic model for the trait of interest did not converge, most likely due to small sample size from the dataset (14 traits, see SI Appendix, for details: Meta-analyses; 1. Population as analysis unit). We therefore obtained a dataset containing meta-analytic results for 218 traits, at this stage, to use for our second-order meta-analyses (Figure 3D).

Meta-analyses: accounting for correlated traits

Our dataset of meta-analytic results included a large number of non-independent traits. To account for dependence, we identified 90 out of 218 traits, and organized them into 19 trait sub-groups (containing 2–10 correlated traits, see Figure 3E). For example, many measurements (i.e. traits) from hematological and immunological assays were hierarchically clustered or overlapped with each other (e.g. cell type A, B and A+B). We combined the meta-analytic results from 90 traits into 19 meta- analytic results (Figure 3F) using the function robu in the R package robumeta with the assumption of sampling errors being correlated with the default value of r = 0.8 (Fisher et al., 2017). Conse- quently, our final dataset for secondary meta-analyses contained 147 traits (i.e. the newly condensed 19 plus the remaining 128 independent traits, see Figure 3, Supplementary file 1, Table 2), which we assume to be independent of each other. Second-order meta-analyses: trait as an analysis unit We created our nine overarching functional groups of traits (Figure 3G) by condensing the IMPC’s 26 procedural categories (‘procedures’) into related clusters. The categories were based on proce- dures that were biologically related, in conjunction with measurement techniques and the number of available traits in each category (see Supplementary file 1, Table 3 for a list of clustered traits, procedures and grouping terms). To test our two hypotheses about how trait variability changes in relation to sex, we estimated overall effect sizes for nine functional groups by aggregating meta-ana- lytic results via ‘classical’ random-effect models using the function rma.uni in the R package metafor (Viechtbauer, 2010). In other words, we conducted three sets of 10 second-order meta-analyses (i.e. meta-analyzing 3 types of effect size: lnRR, lnVR and lnCVR for nine functional groups and one for all the groups combined, Figure 3H). Although we present the frequencies of male- and female- biased traits in Figure 4A, we did not run inferential statistical tests on these counts because such tests would be considered as vote-counting, which has been severely criticised in the meta-analytic literature (Higgins, 2019).

Acknowledgements

SRKZ and ML were supported by the Australian (ARC) Discovery Grant (DP180100818) awarded to SN. JM was supported by EMBL core funding and the NIH Common Fund (UM1-H G006370). AMS was supported by an ARC fellowship (DE180101520).

References

1. Abdulla A.Ahmad
2. Michael D.Randall
3. Richard E.Roberts
2017Sex differences in the role of phospholipase A2-dependent arachidonic acid pathway in the perivascular adipose tissue function in pigs21595The Journal of Physiology66236634Wiley
- doi10.1113/jp274831
1. Jill B.Becker
2. Brian J.Prendergast
3. Jing W.Liang
2016Female rats are not more variable than male rats: a meta-analysis of neuroscience studies17Biology of Sex DifferencesSpringer Science and Business Media LLC
- doi10.1186/s13293-016-0087-5
1. Annaliese K.Beery
2018Inclusion of females does not increase variability in rodent research studies23Current Opinion in Behavioral Sciences143149Elsevier BV
- doi10.1016/j.cobeha.2018.06.016
1. Annaliese K.Beery
2. IrvingZucker
2011Sex bias in neuroscience and biomedical research335Neuroscience & Biobehavioral Reviews565572Elsevier BV
- doi10.1016/j.neubiorev.2010.07.002
1. HalCaswell
2. Daniel E.Weeks
1986Two-Sex Models: Chaos, Extinction, and Other Dynamic Consequences of Sex5128The American Naturalist707735University of Chicago Press
- doi10.1086/284598
1. Janine AustinClayton
2015Studying both sexes: a guiding principle for biomedicine230The FASEB Journal519524Wiley
- doi10.1096/fj.15-279554
1. Janine A.Clayton
2. Francis S.Collins
2014Policy: NIH to balance sex in cell and animal studies7500509Nature282283Springer Science and Business Media LLC
- doi10.1038/509282a
1. Ian R.Cleasby
2. ShinichiNakagawa
2011Neglected biological patterns in the residuals1265Behavioral Ecology and Sociobiology23612372Springer Science and Business Media LLC
- doi10.1007/s00265-011-1254-7
1. FernandoColchero
2. Alix EvaAliaga
3. Owen R.Jones
4. Dalia A.Conde
2017Individual heterogeneity determines sex differences in mortality in a monogamous bird with reversed sexual dimorphism486Journal of Animal Ecology899907Wiley
- doi10.1111/1365-2656.12677
1. JOSÉ JAVIERCUERVO
2. ANDERS PAPEMØLLER
1999Phenotypic variation and fluctuating asymmetry in sexually dimorphic feather ornaments in relation to sex and mating system468Biological Journal of the Linnean Society505529Oxford University Press (OUP)
- doi10.1111/j.1095-8312.1999.tb01186.x
1. José JavierCuervo
2. Anders PapeMøller
2001Components of phenotypic variation in avian ornamental and non-ornamental feathers115Evolutionary Ecology5372Springer Science and Business Media LLC
- doi10.1023/a:1011913804309
1. CharlesDarwin
1871The descent of man, and selection in relation to sex. By Charles Darwin ...D. Appleton and company,
- doi10.5962/bhl.title.24784
1. Mary E.Dickinson
2. Anonymous
3. Ann M.Flenniken
4. XiaoJi
5. LydiaTeboul
6. Michael D.Wong
7. Jacqueline K.White
8. Terrence F.Meehan
9. Wolfgang J.Weninger
10. HenrikWesterberg
11. HibretAdissu
12. Candice N.Baker
13. LynetteBower
14. James M.Brown
15. L. BriannaCaddle
16. FrancescoChiani
17. DaveClary
18. JamesCleak
19. Mark J.Daly
20. James M.Denegre
21. BrendanDoe
22. Mary E.Dolan
23. Sarah M.Edie
24. HelmutFuchs
25. ValerieGailus-Durner
26. AntonellaGalli
27. AlessiaGambadoro
28. JuanGallegos
29. ShiyingGuo
30. Neil R.Horner
31. Chih-WeiHsu
32. Sara J.Johnson
33. SowmyaKalaga
34. Lance C.Keith
35. LouiseLanoue
36. Thomas N.Lawson
37. MonkolLek
38. ManuelMark
39. SusanMarschall
40. JeremyMason
41. Melissa L.McElwee
42. SusanNewbigging
43. Lauryl M. J.Nutter
44. Kevin A.Peterson
45. RamiroRamirez-Solis
46. Douglas J.Rowland
47. EdwardRyder
48. Kaitlin E.Samocha
49. John R.Seavitt
50. MohammedSelloum
51. ZsomborSzoke-Kovacs
52. MasaruTamura
53. Amanda G.Trainor
54. IlincaTudose
55. ShigeharuWakana
56. JonathanWarren
57. OliviaWendling
58. David B.West
59. LeeyeanWong
60. AtsushiYoshiki
61. WolfgangWurst
62. Daniel G.MacArthur
63. Glauco P.Tocchini-Valentini
64. XiangGao
65. PaulFlicek
66. AllanBradley
67. William C.Skarnes
68. Monica J.Justice
69. Helen E.Parkinson
70. MarkMoore
71. SaraWells
72. Robert E.Braun
73. Karen L.Svenson
74. Martin Hrabede Angelis
75. YannHerault
76. TimMohun
77. Ann-MarieMallon
78. R. MarkHenkelman
79. Steve D. M.Brown
80. David J.Adams
81. K. C. KentLloyd
82. ColinMcKerlie
83. Arthur L.Beaudet
84. MajaBućan
85. Stephen A.Murray
2016High-throughput discovery of novel developmental phenotypes7621537Nature508514Springer Science and Business Media LLC
- doi10.1038/nature19356
1. David M.Dorris
2. JinyanCao
3. Jaime A.Willett
4. Caitlin A.Hauser
5. JohnMeitzen
2015Intrinsic excitability varies by sex in prepubertal striatal medium spiny neurons3113Journal of Neurophysiology720729American Physiological Society
- doi10.1152/jn.00687.2014
1. K. L.Flanagan
2014Sexual dimorphism in biomedical research: a call to analyse by sex7108Transactions of the Royal Society of Tropical Medicine and Hygiene385387Oxford University Press (OUP)
- doi10.1093/trstmh/tru079
1. Richard W.Foltin
2. Suzette M.Evans
2018Sex differences in the anorexigenic effects of dexfenfluramine and amphetamine in baboons.426Experimental and Clinical Psychopharmacology335340American Psychological Association (APA)
- doi10.1037/pha0000201
1. Ann-KristinaFritz
2. IrmgardAmrein
3. David P.Wolfer
2017Similar reliability and equivalent performance of female and male mice in the open field and water-maze place navigation task3175American Journal of Medical Genetics Part C: Seminars in Medical Genetics380391Wiley
- doi10.1002/ajmg.c.31565
1. C.Grossman
1989Possible underlying mechanisms of sexual dimorphism in the immune response, fact and hypothesis1-634Journal of Steroid Biochemistry241251Elsevier BV
- doi10.1016/0022-4731(89)90088-5
1. T. F.HANSEN
2. D.HOULE
2008Measuring and comparing evolvability and constraint in multivariate characters521Journal of Evolutionary Biology12011219Wiley
- doi10.1111/j.1420-9101.2008.01573.x
1. L.Hedges
2. ANowell
1995Sex differences in mental test scores, variability, and numbers of high-scoring individuals5220269Science4145American Association for the Advancement of Science (AAAS)
- doi10.1126/science.7604277
2019Cochrane Handbook for Systematic Reviews of InterventionsWiley
- doi10.1002/9781119536604
1. CIngvorsen
2. N AKarp
3. C JLelliott
2017The role of sex and body weight on the metabolic effects of high-fat diet in C57BL/6N mice47Nutrition & Diabetese261-e261Springer Science and Business Media LLC
- doi10.1038/nutd.2017.6
1. YuichiroItoh
2. Arthur P.Arnold
2015Are females more variable than males in gene expression? Meta-analysis of microarray datasets16Biology of Sex DifferencesSpringer Science and Business Media LLC
- doi10.1186/s13293-015-0036-8
1. WendyJohnson
2. AndrewCarothers
3. Ian J.Deary
2008Sex Differences in Variability in General Intelligence: A New Look at the Old Question63Perspectives on Psychological Science518531SAGE Publications
- doi10.1111/j.1745-6924.2008.00096.x
1. Natasha A.Karp
2. Anonymous
3. JeremyMason
4. Arthur L.Beaudet
5. YoavBenjamini
6. LynetteBower
7. Robert E.Braun
8. Steve D.M.Brown
9. Elissa J.Chesler
10. Mary E.Dickinson
11. Ann M.Flenniken
12. HelmutFuchs
13. Martin Hrabe deAngelis
14. XiangGao
15. ShiyingGuo
16. SimonGreenaway
17. RuthHeller
18. YannHerault
19. Monica J.Justice
20. NataljaKurbatova
21. Christopher J.Lelliott
22. K.C. KentLloyd
23. Ann-MarieMallon
24. Judith E.Mank
25. HiroshiMasuya
26. ColinMcKerlie
27. Terrence F.Meehan
28. Richard F.Mott
29. Stephen A.Murray
30. HelenParkinson
31. RamiroRamirez-Solis
32. LuisSantos
33. John R.Seavitt
34. DamianSmedley
35. TaniaSorg
36. Anneliese O.Speak
37. Karen P.Steel
38. Karen L.Svenson
39. ShigeharuWakana
40. DavidWest
41. SaraWells
42. HenrikWesterberg
43. ShayYaacoby
44. Jacqueline K.White
2017Prevalence of sexual dimorphism in mammalian phenotypic traits18Nature CommunicationsSpringer Science and Business Media LLC
- doi10.1038/ncomms15475
1. Sabra L.Klein
2. LondaSchiebinger
3. Marcia L.Stefanick
4. LarryCahill
5. JayneDanska
6. Geert J.de Vries
7. Melina R.Kibbe
8. Margaret M.McCarthy
9. Jeffrey S.Mogil
10. Teresa K.Woodruff
11. IrvingZucker
2015Opinion: Sex inclusion in basic research drives discovery17112Proceedings of the National Academy of Sciences52575258Proceedings of the National Academy of Sciences
- doi10.1073/pnas.1502843112
1. Jean-FrançoisLemaître
2. VictorRonget
3. MorganeTidière
4. DominiqueAllainé
5. VéraneBerger
6. AurélieCohas
7. FernandoColchero
8. Dalia A.Conde
9. MichaelGarratt
10. AndrásLiker
11. Gabriel A. B.Marais
12. AlexanderScheuerlein
13. TamásSzékely
14. Jean-MichelGaillard
2020Sex differences in adult lifespan and aging rates of mortality across wild mammals15117Proceedings of the National Academy of Sciences85468553Proceedings of the National Academy of Sciences
- doi10.1073/pnas.1911999117
1. JanLindström
2. HannaKokko
1998Sexual reproduction and population dynamics: the role of polygyny and demographic sex differences1395265Proceedings of the Royal Society of London. Series B: Biological Sciences483488The Royal Society
- doi10.1098/rspb.1998.0320
1. Jeffrey S.Mogil
2. Mona LisaChanda
2005The case for the inclusion of female subjects in basic science studies of pain1117Pain15Ovid Technologies (Wolters Kluwer Health)
- doi10.1016/j.pain.2005.06.020
1. ShinichiNakagawa
2. RobertPoulin
3. KerrieMengersen
4. KlausReinhold
5. LeifEngqvist
6. MalgorzataLagisz
7. Alistair M.Senior
2014Meta‐analysis of variation: ecological and evolutionary applications and beyond26Methods in Ecology and Evolution143152Wiley
- doi10.1111/2041-210x.12309
1. ShinichiNakagawa
2. Daniel W. A.Noble
3. Alistair M.Senior
4. MalgorzataLagisz
2017Meta-evaluation of meta-analysis: ten appraisal questions for biologists115BMC BiologySpringer Science and Business Media LLC
- doi10.1186/s12915-017-0357-7
1. ShinichiNakagawa
2. GihanSamarasinghe
3. Neal R.Haddaway
4. Martin J.Westgate
5. Rose E.O’Dea
6. Daniel W.A.Noble
7. MalgorzataLagisz
2019Research Weaving: Visualizing the Future of Research Synthesis334Trends in Ecology & Evolution224238Elsevier BV
- doi10.1016/j.tree.2018.11.007
1. ShinichiNakagawa
2. Eduardo S. A.Santos
2012Methodological issues and advances in biological meta-analysis526Evolutionary Ecology12531274Springer Science and Business Media LLC
- doi10.1007/s10682-012-9555-5
NIH. 2015a. Consideration of Sex as a Biological Variable in NIH-Funded Research, Notice NOT-OD-102: National Institutes of Health.
NIH. 2015b. Enhancing Reproducibility Through Rigor and Transparency, Notice NOT-OD-103: National Institutes of Health.
1. AnnaNowogrodzki
2017Inequality in medicine7674550NatureS18-S19Springer Science and Business Media LLC
- doi10.1038/550s18a
1. Brian J.Prendergast
2. Kenneth G.Onishi
3. IrvingZucker
2014Female mice liberated for inclusion in neuroscience and biomedical research40Neuroscience & Biobehavioral Reviews15Elsevier BV
- doi10.1016/j.neubiorev.2014.01.001
R Development Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria, R Foundation for Statistical Computing. http://www.r-project.org.
1. KlausReinhold
2. LeifEngqvist
2013THE VARIABILITY IS IN THE SEX CHROMOSOMES1267Evolution36623668Wiley
- doi10.1111/evo.12224
1996The lek paradox and the capture of genetic variance by condition dependent traits1375263Proceedings of the Royal Society of London. Series B: Biological Sciences14151421The Royal Society
- doi10.1098/rspb.1996.0207
1. Alistair M.Senior
2. WolfgangViechtbauer
3. ShinichiNakagawa
2020Revisiting and expanding the meta‐analysis of variation: The log coefficient of variation ratio411Research Synthesis Methods553567Wiley
- doi10.1002/jrsm.1423
1. Rebecca M.Shansky
2. Catherine S.Woolley
2016Considering Sex as a Biological Variable Will Be Valuable for Neuroscience Research4736The Journal of Neuroscience1181711822Society for Neuroscience
- doi10.1523/jneurosci.1390-16.2016
1. AlbulenaShaqiri
2. MayaRoinishvili
3. LukaszGrzeczkowski
4. EkaChkonia
5. KarinPilz
6. ChristineMohr
7. AndreasBrand
8. MarinaKunchulia
9. Michael H.Herzog
2018Sex-related differences in vision are heterogeneous18Scientific ReportsSpringer Science and Business Media LLC
- doi10.1038/s41598-018-25298-8
1. Benjamin L.Smarr
2. Azure D.Grant
3. IrvingZucker
4. Brian J.Prendergast
5. Lance J.Kriegsfeld
2017Sex differences in variability across timescales in BALB/c mice18Biology of Sex DifferencesSpringer Science and Business Media LLC
- doi10.1186/s13293-016-0125-3
1. CaraTannenbaum
2. Robert P.Ellis
3. FriederikeEyssel
4. JamesZou
5. LondaSchiebinger
2019Sex and gender analysis improves science and engineering7781575Nature137146Springer Science and Business Media LLC
- doi10.1038/s41586-019-1657-6
1. Loren P.Thompson
2. LingChen
3. Brian M.Polster
4. GerardPinkas
5. HongSong
2018Prenatal hypoxia impairs cardiac mitochondrial and ventricular function in guinea pig offspring in a sex-related manner6315American Journal of Physiology-Regulatory, Integrative and Comparative PhysiologyR1232-R1241American Physiological Society
- doi10.1152/ajpregu.00224.2018
1. Joseph L.Tomkins
2. JacekRadwan
3. Janne S.Kotiaho
4. TomTregenza
2004Genic capture and resolving the lek paradox619Trends in Ecology & Evolution323328Elsevier BV
- doi10.1016/j.tree.2004.03.029
1. WolfgangViechtbauer
2010Conducting Meta-Analyses inRwith themetaforPackage336Journal of Statistical SoftwareFoundation for Open Access Statistic
- doi10.18637/jss.v036.i03
1. HeidiWagner
2. Barbara A.Fink
3. KarlaZadnik
2008Sex- and gender-based differences in healthy and diseased eyes1179Optometry - Journal of the American Optometric Association636652Elsevier BV
- doi10.1016/j.optm.2008.01.024
1. Michael M.Webster
2. ChristianRutz
2020How STRANGE are your study animals?7812582Nature337340Springer Science and Business Media LLC
- doi10.1038/d41586-020-01751-5
1. IrvingZucker
2. Brian J.Prendergast
2020Sex differences in pharmacokinetics predict adverse drug reactions in women111Biology of Sex DifferencesSpringer Science and Business Media LLC
- doi10.1186/s13293-020-00308-5
1. MarleneZuk
2. Kurt A.McKean
1996Sex differences in parasite infections: Patterns and processes1026International Journal for Parasitology10091024Elsevier BV
- doi10.1016/s0020-7519(96)80001-4