BIRC6 modifies risk of invasive bacterial infection in Kenyan children

  1. Department of Paediatrics, University of Oxford
    OxfordUnited Kingdom
  2. MRC–Weatherall Institute of Molecular Medicine, University of Oxford
    OxfordUnited Kingdom
  3. Wellcome Centre for Human Genetics, University of Oxford
    OxfordUnited Kingdom
  4. KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
    KilifiKenya
  5. Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford
    OxfordUnited Kingdom
  6. Mahidol Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University
    Bangkok
  7. Institute of Computer Science, University of Tartu
    TartuEstonia
  8. Wellcome Sanger Institute
    CambridgeUnited Kingdom
  9. The Jenner Institute, University of Oxford
    OxfordUnited Kingdom
  10. Division of Medicine, Imperial College
    LondonUnited Kingdom
  11. Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine
    LondonUnited Kingdom
  12. Institute for Global Health Innovation, Department of Surgery and Cancer, Imperial College
    LondonUnited Kingdom

Abstract

Invasive bacterial disease is a major cause of morbidity and mortality in African children. Despite being caused by diverse pathogens, children with sepsis are clinically indistinguishable from one another. In spite of this, most genetic susceptibility loci for invasive infection that have been discovered to date are pathogen specific and are not therefore suggestive of a shared genetic architecture of bacterial sepsis. Here, we utilise probabilistic diagnostic models to identify children with a high probability of invasive bacterial disease among critically unwell Kenyan children with Plasmodium falciparum parasitaemia. We construct a joint dataset including 1445 bacteraemia cases and 1143 severe malaria cases, and population controls, among critically unwell Kenyan children that have previously been genotyped for human genetic variation. Using these data, we perform a cross-trait genome-wide association study of invasive bacterial infection, weighting cases according to their probability of bacterial disease. In doing so, we identify and validate a novel risk locus for invasive infection secondary to multiple bacterial pathogens, that has no apparent effect on malaria risk. The locus identified modifies splicing of BIRC6 in stimulated monocytes, implicating regulation of apoptosis and autophagy in the pathogenesis of sepsis in Kenyan children.

Introduction

Invasive bacterial diseases are a major cause of child morbidity and mortality in Africa 8Berkley et al.2005. Although improved control measures, including the rollout of anti-bacterial conjugate vaccines 13Cowgill et al.200652Silaba et al.2019, have led to recent declines, the burden of conditions such as pneumonia, meningitis, and sepsis secondary to bacterial pathogens remains significant 55Vos2020. A better understanding of the biology of invasive bacterial infections in African populations might help the development of novel interventions.

Susceptibility to invasive bacterial infections varies widely between individuals. In African children, some of this variability is determined by acquired comorbidities such as HIV, malnutrition, and malaria 8Berkley et al.200511Church and Maitland201451Scott et al.2011. However, the identification of common genetic variants as determinants of bacterial infection suggests that a significant portion of this variability is heritable. Many of these genetic susceptibility loci have pathogen-specific effects 15Davila et al.201024Gilchrist et al.201850Rautanen et al.2016, which is consistent with our understanding of infection susceptibility derived from primary immunodeficiencies. Key examples of pathogen specificity among primary immunodeficiencies include Mendelian susceptibility to mycobacterial disease and susceptibility to non-tuberculous mycobacteria and nontyphoidal Salmonella09Bustamante et al.2014, terminal complement deficiencies and meningococcal disease 21Figueroa et al.1993, and IRAK4 deficiency and pneumococcal disease 48Picard et al.2007. A major exception to this is the rs334 A>T mutation in HBB (sickle haemoglobin), which is associated with raised and lowered risks of infection secondary to a broad range of pathogens among homozygotes 59Williams et al.2009 and heterozygotes 51Scott et al.2011, respectively. However, the effect sizes associated with sickle haemoglobin are extreme, being maintained in populations by balancing selection, and larger sample sizes will probably be needed for the discovery of new variants with multi-pathogen effects.

Because the clinical features of invasive bacterial infections and severe malaria are broadly similar 6Bejon et al.2007, it can be difficult to distinguish between severe illness caused by extensive sequestration of malaria parasites in the microvasculature (‘true’ severe malaria) and bacterial sepsis in the presence of incidental parasitaemia on the basis of clinical features alone. This is made harder still by the fact that antibiotic pre-treatment and inadequate blood culture volumes mean that, even in settings with excellent diagnostic facilities, true invasive bacterial infections can often not be confirmed 18Driscoll et al.2017. Recently, we illustrated this clinical complexity through a study in which we used probabilistic models based on malaria-specific biomarkers to show that approximately one-third of children recruited to studies in Africa with a clinical diagnosis of severe malaria were actually suffering from other conditions 57Watson et al.202158Watson et al.2021.

In the current study, we extend this work to show that invasive bacterial infections are common in children admitted to hospital with a clinical diagnosis of severe malaria, but in whom biomarkers subsequently suggest that malaria was probably not the primary cause for their severe illness. We then construct a dataset of genome-wide genotyped samples from 5400 Kenyan children, comprising critically unwell Kenyan children with bacteraemia 50Rautanen et al.2016 and severe malaria 4Band2019, and population controls. Using this dataset, we perform a genome-wide association study (GWAS) of invasive bacterial infection in Kenyan children, weighting cases according to the probability that their disease was mediated by a bacterial pathogen. In doing so, we increase our study power and identify BIRC6 as a novel genetic determinant of invasive bacterial disease in Kenyan children.

Results

Severe malaria probability and risk of bacteraemia

Children admitted to the high dependency ward of Kilifi County Hospital with a clinical diagnosis of severe malaria, defined as a severe febrile illness in the presence of Plasmodium falciparum parasitaemia (n=2200), between 11 June 1995 and 12 June 2008 were included in the study. While this definition is sensitive it is not specific, meaning that our study will have included some children with sepsis accompanied by incidental parasitaemia 57Watson et al.2021. We therefore used two probabilistic models, which included either platelet counts and plasma _Pf_HRP2 concentrations (Model 1, n=1400) or white blood cell and platelet counts (Model 2, n=2200), to determine the likelihood of ‘true’ severe malaria among these children. The estimated probabilities of ‘true’ severe malaria using each model were well correlated (). Of 1400 children with a clinical diagnosis of severe malaria with measured plasma _Pf_HRP2 concentrations, 425 (30.4%, Figure 1A and B) had a low probability (P(SM|Data)<0.5) of having ‘true’ severe malaria (941 of 2220 children using WBC and platelet count data, Figure 1—figure supplement 1A and B). That is, while they presented with febrile illness and concomitant malaria parasitaemia, it is unlikely that their illnesses were directly attributable to malaria.

#' @width 25
#' @height 15

library(dplyr)
library(ggplot2)
library(patchwork)
library(RColorBrewer)
library(ggrepel)
library(ggplotify)
library(nnet)
library(coloc)

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(8,"Dark2")
cols3 <- brewer.pal(11,"RdYlBu")

ps <- read.table("Fig1_data.txt", header = T)

#Plot PfHRP2 concentration vs platelet count - colour by probability 'true' severe malaria, i.e. Model 1
prob_hrp2 = ggplot(ps, aes(log10(platelet), log10(hrp2))) + 
  geom_point(aes(colour = factor(p2_fifths)), size = 2) + scale_colour_manual(values =cols3[c(5,4,3,2,1)], na.translate=FALSE, name = "P(SM | Data)", labels = c("\u2264 0.2", "0.2-0.4", "0.4-0.6", "0.6-0.8", "> 0.8")) +
  theme_bw() +
  scale_y_continuous(breaks=c(0,1,2,3,4,5),
                   labels=c(1,10,expression(10^2), expression(10^3), expression(10^4), expression(10^5))) +
  scale_x_continuous(breaks=c(log10(10),log10(20), log10(50), log10(200), log10(1000)),
                     labels=c(10,20,50,200,1000)) +
  labs(x = expression(paste("Platelet count (x", 10^9,"/L)")),
        y = "PfHRP2 (ng/ml)") +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=15),
        legend.text=element_text(size=12),
        legend.title=element_text(size=15),
        plot.title=element_text(size=50, face = "bold"))

#Plot white cell count vs platelet count - colour by probability 'true' severe malaria, i.e. Model 2

prob_wcc = ggplot(ps, aes(log10(platelet), log10(wbc))) + 
  geom_point(aes(colour = factor(p1_fifths)), size = 2) + scale_colour_manual(values =cols3[c(5,4,3,2,1)], na.translate=FALSE, name = "P(SM | Data)", labels = c("\u2264 0.2", "0.2-0.4", "0.4-0.6", "0.6-0.8", "> 0.8")) +
  theme_bw() +
  scale_y_continuous(breaks=c(log10(1),log10(2),log10(5), log10(10), log10(20), log10(50), log10(100)),
                     labels=c(1,2,5,10, 20,50,100)) +
  scale_x_continuous(breaks=c(log10(10),log10(20), log10(50), log10(200), log10(1000)),
                     labels=c(10,20,50,200,1000)) +
  labs(x = expression(paste("Platelet count (x", 10^9,"/L)")),
       y = expression(paste("White cell count (x", 10^9,"/L)"))) +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=15),
        legend.text=element_text(size=12),
        legend.title=element_text(size=15),
        plot.title=element_text(size=50, face = "bold"))

cols <- brewer.pal(11,"RdYlBu")

#Distribution of 'true' malaria probabilities: Model 1

hist_hrp2<-ggplot(ps, aes(x=P_SM2, fill=factor(p2_fifths), colour=factor(p2_fifths))) + 
  geom_histogram(binwidth=0.012, alpha=0.72) +
  scale_fill_manual(values = cols[c(5,4,3,2,1)]) +
  scale_colour_manual(values = cols[c(5,4,3,2,1)]) +
  theme_bw() + xlab("P(SM | Data)") + theme(legend.position = "none") +
  geom_hline(yintercept=0, color=cols[8]) +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=15))

#Distribution of 'true' malaria probabilities: Model 2

hist_wcc<-ggplot(ps, aes(x=P_SM1, fill=factor(p1_fifths), colour=factor(p1_fifths))) + 
  geom_histogram(binwidth=0.012, alpha=0.72) +
  scale_fill_manual(values = cols[c(5,4,3,2,1)]) +
  scale_colour_manual(values = cols[c(5,4,3,2,1)]) +
  theme_bw() + xlab("P(SM | Data)") + theme(legend.position = "none") +
  geom_hline(yintercept=0, color=cols[8]) +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=15))

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(8,"Dark2")
cols3 <- brewer.pal(11,"RdYlBu")

#PfHRP2 concentration as a predictor of bacteraemia

p_labels = data.frame(expt = c("bloodculture_pos"), label = c("italic(P)==9.62 %*% 10^-6"))
hrp2.box = ggplot(ps, aes(x=factor(bloodculture_pos), y=log10(hrp2))) +
geom_dotplot(binaxis="y", binwidth=0.02, stackdir="center", alpha = 0.75) + geom_boxplot(alpha = 0.5) +
scale_x_discrete(breaks=c(0,1), labels = c("No", "Yes")) + aes(fill = factor(bloodculture_pos), col=factor(bloodculture_pos)) + scale_fill_manual(values = cols[c(8,3)]) + scale_colour_manual(values = cols[c(8,3)]) +
ylab("PfHRP2 (ng/ml)") + xlab("Bacteraemia") +
scale_y_continuous(breaks=c(0,1,2,3,4,5),
                     labels=c(1,10,expression(10^2), expression(10^3), expression(10^4), expression(10^5))) +
theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                             axis.title=element_text(size=15)) + 
  geom_text(x = 1.5, y = Inf, vjust=1.5, aes(label=label), data=p_labels, parse=TRUE, inherit.aes=F, size = 5)


#Platelet count as a predictor of bacteraemia

p_labels = data.frame(expt = c("bloodculture_pos"), label = c("italic(P)==0.0144"))
plt.box = ggplot(ps, aes(x=factor(bloodculture_pos), y=log10(platelet))) +
  geom_dotplot(binaxis="y", binwidth=0.01, stackdir="center", alpha = 0.75) + geom_boxplot(alpha = 0.5) +
  scale_x_discrete(breaks=c(0,1), labels = c("No", "Yes")) + aes(fill = factor(bloodculture_pos), col=factor(bloodculture_pos)) + scale_fill_manual(values = cols[c(8,3)]) + scale_colour_manual(values = cols[c(8,3)]) +
  ylab(expression(paste("Platelet count (x", 10^9,"/L)"))) + xlab("Bacteraemia") +
  scale_y_continuous(breaks=c(log10(10),log10(20), log10(50), log10(200), log10(1000)),
                     labels=c(10,20,50,200,1000)) +
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=15)) + 
  geom_text(x = 1.5, y = Inf, vjust=2, aes(label=label), data=p_labels, parse=TRUE, inherit.aes=F, size = 5)

#White cell count as a predictor of bacteraemia

p_labels = data.frame(expt = c("bloodculture_pos"), label = c("NS"))
wcc.box = ggplot(ps, aes(x=factor(bloodculture_pos), y=log10(wbc))) +
  geom_dotplot(binaxis="y", binwidth=0.01, stackdir="center", alpha = 0.75) + geom_boxplot(alpha = 0.5) +
  scale_x_discrete(breaks=c(0,1), labels = c("No", "Yes")) + aes(fill = factor(bloodculture_pos), col=factor(bloodculture_pos)) + scale_fill_manual(values = cols[c(8,3)]) + scale_colour_manual(values = cols[c(8,3)]) +
  ylab(expression(paste("White cell count (x", 10^9,"/L)"))) + xlab("Bacteraemia") +
  scale_y_continuous(breaks=c(log10(1),log10(2),log10(5), log10(10), log10(20), log10(50), log10(100)),
                     labels=c(1,2,5,10, 20,50,100)) +
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=15)) + 
  geom_text(x = 1.5, y = Inf, vjust=2, aes(label=label), data=p_labels, parse=TRUE, inherit.aes=F, size = 5)


Figure1 <- ((prob_hrp2|hist_hrp2)/(plt.box|hrp2.box)) + plot_layout(heights = c(1, 1)) +
  plot_annotation(tag_levels = 'A')

Figure1

_Pf_HRP2 and platelet count as predictors of severe malaria.

(A) Distribution of _Pf_HRP2 concentrations and platelet count among Kenyan children (n=1400) with a clinical diagnosis of severe malaria. Points are coloured according to the probability of ‘true’ severe malaria given the data. (B) Distribution of ‘true’ severe malaria probabilities estimated from platelet count and plasma _Pf_HRP2 concentrations. (C) Platelets counts in children with a clinical diagnosis of severe malaria with and without concomitant bacteraemia. (D) _Pf_HRP2 concentrations in children with a clinical diagnosis of severe malaria with and without concomitant bacteraemia. Box and whisker plots; boxes depict the upper and lower quartiles of the data, and whiskers depict the range of the data excluding outliers (outliers are defined as data-points > 1.5 × the inter-quartile range from the upper or lower quartiles). Comparisons of platelet count and plasma _Pf_HRP2 concentrations in children with and without bacteraemia were performed with logistic regression.

#' @width 25
#' @height 15

Figure1.supp1 <- ((prob_wcc|hist_wcc)/(plt.box|wcc.box))+ plot_layout(heights = c(1, 1)) +
  plot_annotation(tag_levels = 'A')

Figure1.supp1

White blood cell and platelet count as predictors of severe malaria.

(A) Distribution of white blood cell and platelet count among Kenyan children (n=2200) with a clinical diagnosis of severe malaria. Points are coloured according to the probability of ‘true’ severe malaria given the data. (B) Distribution of ‘true’ severe malaria probabilities estimated from platelet count and white blood cell count. (C) Platelets counts in children with a clinical diagnosis of severe malaria with and without concomitant bacteraemia. (D) White blood cell counts in children with a clinical diagnosis of severe malaria with and without concomitant bacteraemia. Box and whisker plots; boxes depict the upper and lower quartiles of the data, and whiskers depict the range of the data excluding outliers (outliers are defined as data-points > 1.5 × the inter-quartile range from the upper or lower quartiles). Comparisons of platelet and white cell count in children with and without bacteraemia were performed with logistic regression.

In keeping with the hypothesis that a significant proportion of these critically unwell children represented culture-negative invasive bacterial disease (Figure 1), in-patient mortality was higher among children with a low than a high probability of ‘true’ severe malaria (Table 1; , 95% CI , , 95% CI , ). This was also reflected in the rates of concurrent bacteraemia (Table 1; , 95% CI , , 95% CI , ). Similarly, the constituents of Model 1 were each associated with blood culture positivity, both higher platelet counts (OR = 2.36, 95% CI 1.19–4.70, ) and lower _Pf_HRP2 levels (, ) both being associated with the risk of coincident bacteraemia (Figure 1C and D). Conversely, white blood counts in isolation were not associated with risk of concurrent bacteraemia (Figure 1—figure supplement 1). Plasma _Pf_HRP2 is the single best biomarker for severe malaria 28Hendriksen et al.2012. In light of this, and given the greater enrichment for concurrent bacteraemia among children with a low probability of ‘true’ severe malaria as calculated by Model 1 than Model 2, we used Model 1 probabilities in downstream analyses where available (n=1400) and used Model 2 probabilities for all other cases (n=800).

Demographics and clinical characteristics of Kenyan children with severe malaria.

Model Numbers Sex (female) Age (months) Bacteraemia Mortality
_Pf_HRP2/Plt Total (n=1400) 695 (49.6%) 29 (17–44) 51 (3.6%) 155 (11.1%)
P(SM|Data)>0.5 (n=975) 497 (51.0%) 29 (18–45) 23 (2.4%) 94 (7.4%)
P(SM|Data)<0.5 (n=425) 198 (46.6%) 28 (16–43) 28 (6.6%) 61 (14.4%)
WBC/Plt Total (n=2220) 1,074 (48.4%) 28 (15–43) 78 (3.5%) 256 (11.6%)
P(SM|Data)>0.5 (n=1279) 623 (48.7%) 29 (17–44) 32 (2.5%) 106 (8.4%)
P(SM|Data)<0.5 (n=941) 451 (47.9%) 25 (13–40) 46 (4.9%) 150 (15.9%)

GWAS of invasive bacterial disease in Kenyan children

Children with a clinical diagnosis of severe malaria but a low probability of having ‘true’ severe malaria are thus enriched for invasive bacterial disease. Using genome-wide genotyping data from cases of culture-confirmed bacteraemia and healthy controls, we estimated SNP heritability of bacteraemia in this population to be 19% (95% CI 3–35%, ). Despite this, our GWAS of bacteraemia failed to identify genetic correlates of bacteraemia risk beyond the sickle cell locus 50Rautanen et al.2016. Motivated by these observations, we performed a GWAS of invasive bacterial infection in Kenyan children in which we included both children with culture-confirmed bacteraemia and children with a clinical diagnosis of severe malaria. Children admitted to Kilifi County Hospital between 1 August 1998 and 30 October 2010 with community-acquired bacteraemia were recruited to the study as well as children from the severe malaria study described above. Control children were recruited from the same population between 1 August 2006 and 30 December 2010 as described in detail previously 51Scott et al.2011.

Following quality control measures (see Materials and methods), we included 1445 cases of culture-confirmed bacteraemia, 1143 cases of severe malaria, and 2812 control children in our current analysis (Table 2, Figure 2). To account for the varying proportion of invasive bacterial disease among severe malaria cases, we applied weights to our regression analysis to reflect the greater likelihood of invasive bacterial disease among children with a low probability of ‘true’ severe malaria (sample weight, ). Where _Pf_HRP2 concentrations were available (n=909) we used _Pf_HRP2 and platelet count to determine while we used white cell and platelet counts (n=234) in cases where they were not available. Cases with culture-proven bacteraemia and control samples were assigned a sample weight of . Inclusion of the six major principal components (PCs) of genotyping data and genotyping platform as covariates in the model adequately controlled for confounding variation (, Figure 3—figure supplement 1). In that analysis we found evidence supporting an association between risk of invasive bacterial disease in Kenyan children and seven SNPs at a single locus on chromosome 2 (peak SNP: rs183868412:T, OR = 2.13, 95% CI 1.65–2.74, ) (Figure 3, Table 3). Fine mapping of this association identified a credible set of seven SNPs with a 95% probability of containing the causal variant (Table 3), spanning a 212 kb region: chr2:32,402,640–32,614,746.

Study sample recruitment.

Recruitment of severe malaria cases (n=1143), discovery analysis bacteraemia cases (n=1445), and replication analysis bacteraemia cases (n=434) between 1998 and 2010. The time period during which control samples (n=2812) were recruited is also highlighted (black bar). Sample numbers represent children with genome-wide genotype data, who passed quality control filters and were included in this study.

#' @width 25
#' @height 15
#thinned example summary pvalues 1:100 SNPs

disc.add <- read.table("example.pval.gz", header = F)
colnames(disc.add) <- c("chr", "bp", "rsid", "p")
#str(disc.add)

disc.add$chr <- factor(disc.add$chr, levels = c(1:22))
levels(disc.add$chr) <- c(1:22)

#highlight chr2 hit
disc.add$label <- NA
disc.add$label[which(disc.add$rsid=="rs183868412:T")] <- "BIRC6"

 
disc.add$anno <- NA
disc.add$anno[which(disc.add$rsid=="rs183868412:T")] <- 1

#data for manahattan
don.1 <- disc.add %>% 
  group_by(chr) %>% 
  summarise(chr_len=max(bp)) %>% 
  mutate(tot=cumsum(as.numeric(chr_len))-chr_len) %>%
  dplyr::select(-chr_len) %>%
  left_join(disc.add, ., by=c("chr"="chr")) %>%
  arrange(chr, bp) %>%
  mutate( BPcum=bp+tot)

axisdf = don.1 %>% group_by(chr) %>% summarize(center=( max(BPcum) + min(BPcum) ) / 2 )

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(8,"Dark2")
cols3 <- brewer.pal(8,"Paired")

#Manhattan plot
ggplot(don.1, aes(x=BPcum, y=-log10(p))) +
  geom_point( aes(color=as.factor(chr)), size=2) +
  scale_color_manual(values = rep(c(cols3[2], cols3[1]), 11 )) +
  geom_text_repel( data=subset(don.1, anno==1), aes(label=label), size=5, min.segment.length = unit(0, 'lines'),
                   nudge_y = 1) +
  #gwas sig line
  geom_segment(aes(x = 10177, y = 7.30103, xend = 2879943885, yend = 7.30103), data = don.1, linetype="dashed", color = "red") +
  scale_x_continuous( label = axisdf$chr[c(1:18,20,22)], breaks= axisdf$center[c(1:18,20,22)] ) +
  scale_y_continuous( labels = c("0","2","4","6","8","10"), breaks = c(0,2,4,6,8,10), expand = c(0, 0), limits= c(0,12)) +     # remove space between plot area and x axis
  xlab("chromosome") +
  theme_bw() +
  theme( 
    legend.position="none",
    panel.border = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),axis.text=element_text(size=15),
    axis.title=element_text(size=20)
  )

Manhattan plot of invasive bacterial infection in Kenyan children.

Evidence for association with invasive bacterial disease at genotyped and imputed autosomal SNPs and indels (n=14,010,600) in Kenyan children (bacteraemia cases =1445, severe malaria cases =1143, controls = 2812). Association statistics represent additive association. The red, dashed line denotes .

Genome-wide association analysis quality control.

(A) Principal components of genome-wide genotyping data in discovery samples. Individuals are colour-coded according to genotyping platform; Affymetrix SNP 6.0 in purple, Illumina Omni 2.5M in orange. (B) Principal components of genome-wide genotyping data in discovery samples. Individuals are colour-coded according to self-reported ethnicity; Chonyi in red, Giriama in blue, Kauma in green, and other in grey.

#' @width 10
#' @height 10

df <- data.frame(observed = -log10(sort(na.omit(disc.add$p))),
    expected = -log10(ppoints(length(na.omit(disc.add$p)))))
  
  log10Pe <- expression(paste("Expected -log"[10], plain(P)))
  log10Po <- expression(paste("Observed -log"[10], plain(P)))
  
  chisq <- qchisq(1-disc.add$p,1)
  #median(chisq)/qchisq(0.5,1)
  
  lambda <- paste("lambda == ", round(median(chisq)/qchisq(0.5,1),3))

#QQ plot
Figure3.supp1c <- ggplot(df) +
  geom_abline(intercept = 0, slope = 1, size = 1, colour = cols3[6]) +
  geom_point(aes(expected, observed), size = 3, colour = cols3[2]) +
  xlab(log10Pe) +
  ylab(log10Po) +
  annotate("text", x = 1, y = 9, label = lambda, parse = TRUE, size = 10) +
  theme_bw() + ylim(NA, 10) + 
  theme( 
    legend.position="none",
    panel.border = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),axis.text=element_text(size=15),
    axis.title=element_text(size=20), plot.title=element_text(size=30, face = "bold", hjust = 0.5))
  
Figure3.supp1c
[{type:'Heading',depth:3,content:['Quantile-quantile plot of invasive bacterial infection in Kenyan children.']},{type:'Paragraph',content:['QQ plot of weighted logistic regression genome-wide association study (GWAS) of invasive bacterial disease in Kenyan children (2588 cases, 2812 controls).']}]

Principal components of genome-wide genotyping data in discovery samples.

Outlier samples identified by ABERRANT (n=22) are highlighted (orange).

Sensitivity analysis of case weights.

Comparison of effect sizes (odds ratios and 95% confidence intervals) of the rs183868412:T association with invasive bacterial infection in the combined analysis (bacteraemia cases = 1445, severe malaria cases = 1143, controls = 2812), and restricted to cases with weights calculated with _Pf_HRP2 plasma concentration and platelet count (Model 1; bacteraemia cases = 1445, severe malaria cases = 909, controls = 2812) or white cell count and platelet count (Model 2; bacteraemia cases = 1445, severe malaria cases = 909, controls = 2812). Effect estimates were calculated with weighted logistic regression.

#' @width 15
#' @height 10
#Genotyping concordance data
concord <- read.table("genotyping.concord.txt", header = TRUE)

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(8,"Dark2")
cols3 <- brewer.pal(8,"Paired")

ggplot(data=concord, aes(pairwise)) + 
  geom_histogram(breaks=seq(0.8, 1, by = 0.001), col = cols3[2]) + 
  labs(x="pairwise concordance", y="count") + 
  xlim(c(0.8,1.01)) +
  theme_bw() +
  theme( 
    legend.position="none",
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),axis.text=element_text(size=15),
    axis.title=element_text(size=20))

Genotyping concordance between Illumina and Affymetrix platforms.

Pairwise genotyping concordance between samples genotyped on both Affymetrix SNP 6.0 and Illumina Omni 2.5M platforms.

Demographics and clinical characteristics of genome-wide association study (GWAS) study samples.

Severe malaria subtypes Concurrent infection
Numbers Sex (female) Age (months) SMA CM Malaria Bacteraemia Mortality
Bacteraemia (overall) 1445 614 (43%) 14 (5–34) 94 (12%) 358 (26%)
Acinetobacter 118 45 (38%) 13 (3–28) 11 (13%) 12 (10%)
β-Haemolytic streptococci 130 60 (46%) 5 (1–20) 6 (8%) 37 (30%)
Escherichia coli 141 58 (41%) 11 (6–25) 12 (15%) 45 (34%)
Hib 113 53 (47%) 12 (5–25) 3 (8%) 29 (26%)
NTS 159 75 (47%) 15 (9–26) 15 (25%) 31 (20%)
Streptococcus pneumoniae 390 151 (39%) 23 (9–61) 20 (9%) 86 (23%)
Staphylococcus aureus 152 64 (42.1%) 26 (9–88) 15 (15%) 22 (15%)
Other 242 110 (46%) 10 (1–28) 10 (8%) 96 (41%)
Malaria (overall) 1143 559 (49%) 27 (16–41) 298 (26%) 697 (61%) 40 (4%) 140 (12%)
P(SM|Data)<0.5 375 176 (47%) 28 (17–42) 62 (17%) 262 (70%) 23 (6%) 62 (17%)
P(SM|Data)>0.5 768 383 (50%) 26 (15–40) 236 (31%) 435 (57%) 17 (2%) 17 (2%)
Controls (SNP 6.0) 1895 955 (50%) *
Controls (Omni 2.5M) 917 451 (49.2%)

95% credible SNP set of invasive bacterial disease association.

SNP Effect allele Chr BP MAF Info score OR (95% CI) p-Value
rs183868412 T 2 32,478,169 0.021 0.956 2.13 (1.65–2.74)
rs139827594 G 2 32,402,640 0.02 0.966 2.12 (1.65–2.73)
rs144257579 G 2 32,507,619 0.021 0.954 2.11 (1.64–2.72)
rs145056232 C 2 32,503,024 0.021 0.955 2.11 (1.64–2.72)
rs145315025 G 2 32,502,654 0.021 0.955 2.11 (1.64–2.72)
rs143909151 T 2 32,531,452 0.021 0.962 2.11 (1.64–2.71)
rs150430979 T 2 32,614,746 0.021 0.955 2.11 (1.64–2.72)

To address the possibility that the observed association at this locus is driven by confounding secondary to population structure, we used ABERRANT 7Bellenguez et al.2012 to define a set of outlier samples on the first two PCs of genotyping data (n=22, Figure 3—figure supplement 2). These individuals are all genotyped on the Illumina Omni 2.5M array and are both cases and controls (15 and 7, respectively). While they are predominantly individuals with less common self-reported ethnicities in our study population (19 of 22 are not Giriama, Chonyi, or Kauma), they are not representative of a single self-reported ethnicity (the most common single ethnicity in this group is Digo, n=7). Excluding these samples from the association analysis at rs183868412:T did not significantly alter the association with invasive bacterial infection (, OR = 2.05, 95% CI 1.59–2.64). We further estimated the effect of rs183868412:T on invasive bacterial disease risk in four subpopulations defined by self-reported ethnicity (Giriama, ; Chonyi, ; Kauma, ; Other, ). Within each subpopulation, we tested for association between genotype and case status in a weighted logistic regression model, including platform as a categorial covariate (Table 4). The minor allele frequency (MAF) at rs183868412 ranged from 0.016 (Giriama) to 0.037 (Kauma), with no evidence of differentiation between subpopulations (). We observed consistent effect sizes in both of the major study subpopulations; Giriama (OR =1.97, 95% CI 1.30–3.01, ) and Chonyi (OR = 2.18, 95% CI 1.34–3.54, ) samples (Table 4), which together make up 83% of the study samples. Genotype at rs183868412 was also associated with invasive bacterial disease among the less common self-reported ethnicities grouped together (OR = 2.46, 95% CI 1.01–5.96, ). Genotype was not associated with invasive bacterial disease risk in the Kauma subpopulation, however the sample size in the stratum is very limited (154 cases, 318 controls) and may simply reflect insufficient power to detect an association.

Effect of rs183868412 genotype on risk of invasive bacterial disease stratifies by self-reported ethnicity.

Discovery population Numbers Genotypes MAF OR (95% CI) p-Value
Giriama Cases Overall 1232 0/56/1176 0.023 1.97 (1.30–3.01)
Bacteraemia 558 0/38/520 0.034
SM − P(SM|Data)<0.5 199 0/12/187 0.03
SM − P(SM|Data)>0.5 475 0/6/469 0.006
Controls 1269 0/41/1228 0.016
Chonyi Cases Overall 503 0/38/465 0.038 2.18 (1.34–3.54)
Bacteraemia 238 0/27/211 0.057
SM − P(SM|Data)<0.5 105 0/4/101 0.019
SM − P(SM|Data)>0.5 160 0/7/153 0.022
Controls 1057 0/43/1014 0.02
Kauma Cases Overall 154 0/8/146 0.026 1.20 (0.50–2.85)
Bacteraemia 70 0/6/64 0.043
SM − P(SM|Data)<0.5 25 0/1/24 0.02
SM − P(SM|Data)>0.5 59 0/1/58 0.008
Controls 318 0/20/298 0.031
Other Cases Overall 219 1/16/202 0.041 2.46 (1.01–5.96)
Bacteraemia 101 1/8/92 0.05
SM − P(SM|Data)<0.5 38 0/5/33 0.066
SM − P(SM|Data)>0.5 80 0/3/77 0.019
Controls 165 0/7/158 0.021
Total Cases Overall 2588 3/164/2421 0.033 2.13 (1.65–2.74)
Bacteraemia 1445 3/125/1317 0.045
SM − P(SM|Data)<0.5 375 0/20/355 0.027
SM − P(SM|Data)>0.5 768 0/19/749 0.012
Controls 2812 0/111/2701 0.02

To assess whether our analysis could be affected by our choice of model to define severe malaria case weights, we restricted our analysis to samples with data available to calculate estimates for using both Model 1 and Model 2 (n=909 severe malaria cases). We recalculated effect estimates for the rs183868412 association with invasive bacterial disease using each model alone. The association with invasive bacterial disease at rs183868412:T is robust to the choice of the model for case weights, with effect estimates derived using Model 1 alone (OR = 2.13, 95% CI 1.65–2.75, ) and Model 2 alone (OR = 2.05, 95% CI 1.59–2.65, ) being entirely consistent with those seen in the main analysis (Figure 3—figure supplement 3). Moreover, restricting our analysis to cases of culture-confirmed bacteraemia, the effect estimate for bacteraemia risk observed in the discovery analysis (1445 cases, 2812 controls; OR = 2.12, 95% CI 1.60–2.82, ) is consistent with that seen in the main model.

We sought to replicate evidence of association in our discovery analysis through use of an independent case-control collection of Kenyan children with bacteraemia (n=434) and healthy controls (n=1258) conducted in the same population. The peak trait-associated variants in the discovery analysis were well imputed in the replication data (rs183868412 imputation info score ). In that analysis, we found evidence supporting the association at chromosome 2 with invasive bacterial disease (Figure 4, Table 5: rs183868412:T, OR = 2.85, 95% CI 1.54–5.28, ). In a fixed effects meta-analysis, rs183868412:T was strongly associated with risk of invasive bacterial disease in Kenyan children: OR = 2.22, 95% CI 1.76–2.80, . That association was driven by children with culture-confirmed bacteraemia and critically unwell children with malaria parasites, but a low probability of ‘true’ severe malaria. In a stratified analysis (Figure 4, Table 5), rs183868412 was associated with culture-confirmed bacteraemia (OR = 2.12, 95% CI 1.60–2.82, ) and critical illness with parasitaemia and with a low probability of ‘true’ severe malaria (P(SM|Data)<0.5: OR = 2.37, 95% CI 1.27–4.43, ), but was not associated with risk of critical illness with a high probability of ‘true’ severe malaria (P(SM|Data)>0.5: ).

#' @width 40
#' @height 30
for_plot <- read.table("birc6_for_RA_b38.txt", header = T)

birc6.region <- subset(for_plot, bp>31277134 & bp<33274933)

birc6.region$bin_r2 <- 1
birc6.region$bin_r2[which(birc6.region$r2>0.1 & birc6.region$r2 <= 0.3)] <- 2
birc6.region$bin_r2[which(birc6.region$r2>0.3 & birc6.region$r2 <= 0.5)] <- 3
birc6.region$bin_r2[which(birc6.region$r2>0.5 & birc6.region$r2 <= 0.7)] <- 4
birc6.region$bin_r2[which(birc6.region$r2>0.7)] <- 5

birc6.region$bin_dp <- 1
birc6.region$bin_dp[which(birc6.region$dp>0.6 & birc6.region$dp <= 0.7)] <- 2
birc6.region$bin_dp[which(birc6.region$dp>0.7 & birc6.region$dp <= 0.8)] <- 3
birc6.region$bin_dp[which(birc6.region$dp>0.8 & birc6.region$dp <= 0.9)] <- 4
birc6.region$bin_dp[which(birc6.region$dp>0.9)] <- 5

birc6.region$annotate <- 0
birc6.region$annotate[c(which(birc6.region$rsid=="rs183868412:T"))] <- 1
cols3 <- brewer.pal(8,"Paired")
cols2 <- brewer.pal(11,"Spectral")
cols <- brewer.pal(8,"Set2")

#BIRC6 - regional assocaition plot (discovery data)

birc6_total <- ggplot(birc6.region, aes(x=bp, y=-log10(p))) + 
  xlim(31874931, 32874933) +
  geom_point(data=subset(birc6.region, bin_r2==1), color=cols[8], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==2), color=cols2[5], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==3), color=cols2[3], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==4), color=cols2[2], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==5), color=cols2[1], size=3) +
  scale_y_continuous(name="-log P-value", breaks=c(0,2,4,6,8,10)) +
  xlab(NULL) + 
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=12)) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())+
  geom_text_repel( data=subset(birc6.region, annotate==1), aes(label=rsid), size=5, col = c("black"), nudge_y = 1) +
  annotate("text", x = (32180000-225069), y = 11, label = quote(r^2), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 10.3, size = 3, colour = cols2[1]) +
  annotate("text", x = (32180000-225069), y = 10.3, label = c("0.7-1.0"), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 9.6, size = 3, colour = cols2[2]) +
  annotate("text", x = (32180000-225069), y = 9.6, label = c("0.5-0.7"), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 8.9, size = 3, colour = cols2[3]) +
  annotate("text", x = (32180000-225069), y = 8.9, label = c("0.3-0.5"), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 8.2, size = 3, colour = cols2[5]) +
  annotate("text", x = (32180000-225069), y = 8.2, label = c("0.1-0.3"), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 7.5, size = 3, colour = cols[8]) +
  annotate("text", x = (32180000-225069), y = 7.5, label = c("<0.1"), hjust = 0.5) +
  annotate("text", x = (33100000-225069), y = 11, label = c("Discovery"), hjust = 1, size=10)

#BIRC6 - regional assocaition plot (replication data)

birc6_rep <- ggplot(birc6.region, aes(x=bp, y=-log10(p_rep))) + 
  xlim(31874931, 32874933) +
  geom_point(data=subset(birc6.region, bin_r2==1), color=cols[8], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==2), color=cols2[5], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==3), color=cols2[3], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==4), color=cols2[2], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==5), color=cols2[1], size=3) +
  scale_y_continuous(name="-log10(p)", breaks=c(0,2,4), limits=c(NA,4)) + 
  xlab(NULL) + 
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=12)) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())+
  geom_text_repel( data=subset(birc6.region, annotate==1), aes(label=rsid), size=5, col = c("black"), nudge_y = 0.5) +
  annotate("text", x = 32874933, y = 3.8, label = c("Replication"), hjust = 1, size=10)


#plot genes
genes <- read.table("genes_b38.txt", header = T)

plot.range <- c(31874931, 32874933)
genes$order <- rep(seq(1:1),100)[c(1:length(genes$end_position))]
genes.plot <- ggplot(genes, aes(x=start, y=order+1)) + 
  geom_point(size=0) +
  xlim(31874931, 32874933) +
  ylim(c(1.9,2.1)) +
  geom_segment(data = genes[seq(9,15,2),],
               aes(x=start, xend=end, y=order+1, yend=order+1), size = 1, colour = cols3[2],
               arrow = arrow(length = unit(0.25, "cm"))) +
  geom_segment(data = genes[seq(10,16,2),],
               aes(x=start, xend=end, y=order+1.1, yend=order+1.1), size = 1, colour = cols3[2],
               arrow = arrow(length = unit(0.25, "cm"))) +
  geom_text_repel( data = genes[seq(9,15,2),], aes(x=mid, label=external_gene_name), size=4, col = c("black"),
                   nudge_y =-0.05, segment.color = NA) +
  geom_text_repel( data = genes[seq(10,16,2),], aes(x=mid, label=external_gene_name), size=4, col = c("black"),
                   nudge_y =0.05, segment.color = NA) +
  theme_bw() +
  theme(axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

#plot recombination rate

recomb <- read.table("genetic_map_chr2_b38.txt", header = F)
colnames(recomb) <- c("position", "rate", "map")
recomb_birc6 <- subset(recomb, position>31874931 & position<32874933)

recomb_rate <- ggplot(recomb_birc6, aes(x=position, y=rate)) + 
  geom_line() +
  theme_bw() +
  ylab("cM/Mb") +
  xlab("chromosome 2") +
  scale_x_continuous(breaks=c(32000000,32200000,32400000,32600000,32800000),
                     labels=c("32.0Mb", "32.2Mb", "32.4Mb", "32.6Mb", "32.8Mb")) +
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"))



cols <- brewer.pal(11,"RdYlBu")

data <- read.table("prob_plot.txt", header = T)

#Plot distribution of 'true' severe malaria probabilities

data$prob_strata_half <- "SM: P(SM|Data)<0.5"
data$prob_strata_half[which(data$p_bact<0.5)] <- "SM: P(SM|Data)>0.5"
data$prob_strata_half[which(data$prob_strata=="BACT")] <- "BACT"
data$prob_strata_half[which(data$prob_strata=="CONTROL")] <- "CONTROL"
data$prob_strata_half <- factor(data$prob_strata_half)
data$prob_strata_half <- relevel(data$prob_strata_half, ref = "CONTROL")

data1 <- data[-c(which(data$prob_strata_half=="CONTROL"), which(data$prob_strata_half=="BACT")),]

prob_SM_hist<-ggplot(data1, aes(x=(1-p_bact), fill=factor(prob_strata_half), colour=factor(prob_strata_half))) + 
  geom_histogram(binwidth=0.012, alpha=0.72) +
  scale_fill_manual(values = cols[c(4,2)]) +
  scale_colour_manual(values = cols[c(4,2)]) +
  theme_bw() + xlab("P(Severe Malaria | Data)") + theme(legend.position = "none") +
  geom_hline(yintercept=0, color=cols[8]) +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=14))

cols <- brewer.pal(8, "Set2")
cols2 <- brewer.pal(8, "Dark2")

#Plot effect of rs86868412 genotype across case strata

label <- rep(c("SM - P.high", "SM - P.low", "Bacteraemia - disc", "Bacteraemia - rep"),3)
mean  <- c(
  0.07180831, 
  0.8637545, 
  0.7536469, 
  1.06025)
se  <- c(
  0.3202724, 
  0.3192534, 
  0.1447982, 
  0.313471)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

#levels(df$label)

cols3 <- brewer.pal(8, "Set2")
cols <- brewer.pal(11,"RdYlBu")

# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=rev(df$label)[c(1:4)])
birc6.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = cols[c(10,10,4,2)]) + scale_colour_manual(values = cols[c(10,10,4,2)]) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  coord_flip() +  # flip coordinates (puts labels on y axis)
  ylab("OR") + 
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0"), limits=c(-1,2)) +
  scale_x_discrete(expand = expand_scale(add = 1)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15))+
  facet_wrap(~facet, ncol=1)+
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 12, color = "white", face = "bold.italic"))

((birc6_total/birc6_rep/genes.plot/recomb_rate)/(prob_SM_hist|birc6.fp)) + plot_layout(heights = c(3, 1.5, 1, 1,3))

Association with invasive bacterial disease at the BIRC6 locus.

(A) Regional association plot of invasive bacterial disease association at the BIRC6 locus in the discovery and replication analyses. SNPs are coloured according to linkage disequilibrium to rs183868412. Genomic coordinates are GRCh38. (B) Distribution of ‘true’ severe malaria probabilities among malaria cases estimated from plasma _Pf_HRP2 concentration and platelet count (n=909) and white blood cell count and platelet count (n=234). (C) Odds ratios and 95% confidence intervals of rs183868412 association with disease stratified by malaria cases with high (p>0.5, red) and low (p<0.5, orange) probabilities of ‘true’ severe malaria and culture-proven invasive bacterial disease (blue). P(SM|Data) represents the probability of ‘true’ severe malaria estimated from plasma _P_fHRP2 concentration and platelet count (n=909) or white blood cell count and platelet count (n=234). Effect estimates were calculated with multinomial logisitic regression (SM - P.high, SM - P.low, Bacteraemia - disc) and logistic regression (Bacteraemia - rep).

Principal components of genome-wide genotyping data in replication samples.

Individuals are colour-coded according to self-reported ethnicity; Chonyi in red, Giriama in blue, Kauma in green, and other in grey.

Effect of rs183868412 genotype on risk of invasive bacterial disease in Kenyan children.

Numbers Genotypes MAF OR (95% CI) p-Value
Discovery Cases Overall 2588 3/164/2421 0.033 2.13 (1.65–2.74)
Bacteraemia* 1445 3/125/1317 0.045 2.12 (1.60–2.82)
SM − P(SM|Data)<0.5* 375 0/20/355 0.027 2.37 (1.27–4.43)
SM − P(SM|Data)>0.5* 768 0/19/749 0.012 1.07 (0.57–2.01)
Controls 2812 0/111/2701 0.02
Replication Cases 434 0/24/410 0.028 2.85 (1.54–5.28)
Controls 1258 0/28/1230 0.011
Meta-analysis Cases 3022 3/188/2831 0.032 2.23 (1.76–2.80)
Controls 4070 0/139/3931 0.017

rs183868412 is associated with risk of invasive bacterial disease secondary to diverse pathogens and is independent of malaria

Previous data describing the genetic risk of invasive bacterial disease in this population have identified pathogen-specific effects. To better-understand the range of pathogens to which genetic variation at BIRC6 modifies risk, we estimated the effect of rs183868412 on the risk of bacteraemia caused by the seven most common causative pathogens within this population (Figure 5A). In that analysis, the data best-supported a model in which genotype increases risk of bacteraemia caused by a broad range of pathogens, including bacteraemia secondary to pneumococcus, nontyphoidal Salmonellae, Escherichia coli, β-haemolytic streptococci, Staphylococcus aureus, and other less common pathogens grouped as a single stratum (log10 Bayes factor = 4.72). Genotype at rs183868412 similarly modified risk of bacteraemia in the neonatal period and in older children (log10 Bayes factor = 2.70, Figure 5B).

#' @width 40
#' @height 30
#Plot pathogen-specific effects of rs183868412 genotype

cols <- brewer.pal(11, "RdYlBu")
cols2 <- brewer.pal(8, "Dark2")
cols3 <- brewer.pal(8, "Set2")

label <- rep(c("Aci (n=118)", "BHS (n=130)", "E. coli (n=141)", "Hib (n=113)", "NTS (n=159)", "Other (n=242)","Pneumo (n=391)", "S. aureus (n=151)"),3)
mean  <- c(0.1294529, 0.97544283, 0.94372723, 0.02255999, 0.99779713, 0.80957666, 0.80415958, 0.53290879)
se  <- c(0.4374372, 0.3009951, 0.2873546, 0.4725781, 0.2694641, 0.2438325, 0.2045006, 0.3296354)
lower <- mean-1.96*se
upper <- mean+1.96*se
  
df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=rev(df$label)[c(1:8)])
path.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = c(cols[c(10,10,10,10)],cols3[8],cols[c(10,10)],cols3[8])) + scale_colour_manual(values = c(cols[c(10,10,10,10)],cols3[8],cols[c(10,10)],cols3[8])) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  coord_flip() +  # flip coordinates (puts labels on y axis)
  ylab("OR") + scale_x_discrete(expand = expand_scale(add = 1)) +
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0"), limits = c(-1,2)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),plot.title=element_text(size=20))+
  facet_wrap(~facet, ncol=1)+
  ggtitle("Pathogens") +
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))

#Plot stratum-specific effects of rs183868412 genotype on neonatal on non-neonatal disease

label <- rep(c("1st 29 days (n=195)", ">28 days (n=1,245)"),3)
mean  <- c(0.9121288, 0.5966586)
se  <- c(0.2547673, 0.1514635)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=rev(df$label)[c(1:2)])
neo.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  ylim(-1,2) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = c(cols[c(10,10)])) + scale_colour_manual(values = c(cols[c(10,10)])) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  coord_flip() +  # flip coordinates (puts labels on y axis)
  ylab("OR") + scale_x_discrete(expand = expand_scale(add = 1)) +
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0"), limits=c(-1,2)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),plot.title=element_text(size=20))+
  facet_wrap(~facet, ncol=1)+
  ggtitle("Neonatal sepsis") +
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))


#Plot stratum-specific effects of rs183868412 genotype on neonatal on non-neonatal disease

label <- rep(c("Bact. no Pf (n=1,236)", "Bact. with Pf (n=204)"),3)
mean  <- c(0.5811352, 0.9540062)
se  <- c(0.1525740, 0.2424632)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

#levels(df$label)

cols3 <- brewer.pal(8, "Set2")


# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=rev(df$label)[c(1:2)])
para.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = c(cols[c(10,10)])) + scale_colour_manual(values = c(cols[c(10,10)])) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  coord_flip() +  # flip coordinates (puts labels on y axis)
  ylab("OR") + scale_x_discrete(expand = expand_scale(add = 1)) +
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0"), limits=c(-1,2)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),plot.title=element_text(size=20))+
  facet_wrap(~facet, ncol=1)+
  ggtitle("Parasitaemia") +
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))

#Plot stratum-specific effects of rs183868412 genotype on risk of bacteraemia over study period

cols <- brewer.pal(11, "RdYlBu")
cols2 <- brewer.pal(8, "Dark2")


label <- c("95", "96", "97", "98", "98/99/00", "00","01/02","02","03/04","04","05/06","06","07/08","08", "09/10", "10","11" , "12", "13","14", "15")
mean  <- c(NA,NA,NA,NA,0.826615, NA, 0.796132,NA,0.546421,NA,0.510647,NA,0.567562,NA,1.03668,NA,NA,NA,NA,NA,NA)
se  <- c(NA,NA,NA,NA,0.174712, NA,0.203987,NA, 0.200618, NA, 0.242699,NA, 0.268059,NA, 0.509352,NA,NA,NA,NA,NA,NA)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "Bacteraemia risk & rs183868412:T"

#levels(df$label)

cols3 <- brewer.pal(8, "Set2")


# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=c("95", "96", "97", "98", "98/99/00", "00","01/02","02","03/04","04","05/06","06","07/08","08", "09/10", "10","11" , "12", "13","14", "15"))
yr.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  scale_x_discrete(breaks=c("95","00","05/06","10"),
                                 labels=c("1995", "2000", "2005", "2010"),expand = expand_scale(add = 1))+
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = cols[rep(10,21)]) + scale_colour_manual(values = c(cols[rep(10,21)])) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  ylab("OR") +
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361, 2.079441542),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0", "8.0"), limits=c(-1,2.5)) +
  theme_bw() + theme(legend.position = "none", axis.title.x = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),panel.grid.major = element_blank(), 
                     panel.grid.minor = element_blank(),axis.text.x=element_blank(),
                     axis.ticks.x=element_blank())+
  facet_wrap(~facet, ncol=1)+
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))

pp = read.table("para_prev.txt", header = TRUE)

#Plot falling parasite prevalence among trauma admission over the study period

para_prev<-ggplot(data=pp, aes(x=year, y=malinc)) + geom_line(lwd=1.5) +
geom_ribbon(aes(ymin=low, ymax=high), linetype=2, alpha=0.3, fill = "grey70", color = "red", 
            outline.type = "both") +
  theme_bw() + xlab("Year") + ylab("Prevalence (%)") + theme(panel.grid.major = element_blank(), 
                                                             panel.grid.minor = element_blank(), 
                                                             axis.title=element_text(size=15), 
                                                             axis.text=element_text(size=12),
                                                             plot.title=element_text(size=18))

(path.fp|(neo.fp/para.fp))/(yr.fp/para_prev) + plot_annotation(tag_levels = 'A')

Genetic variation at BIRC6 confers broad susceptibility to invasive bacterial disease.

Odds ratios and 95% confidence intervals of rs183868412 association with invasive bacterial disease stratified by pathogen (A), neonatal and non-neonatal sepsis (B), and bacteraemia with and without malaria parasitaemia (C). Odds ratios and 95% confidence intervals of rs183868412 association with invasive bacterial disease stratified by year (D), compared to age-standardised, annual malaria parasite prevalence in Kilifi, Kenya, as estimated from parasite prevalence among trauma admissions (E).We calculated effect estimates using multinomial logistic regression. We compared models of association across strata using a Bayesian approach (see Materials and methods). Strata associated with rs183868412 genotype in the most likely model in each analysis are highlighted in blue.

Malaria infection results in increased risk of invasive bacterial disease secondary to a broad range of pathogens 51Scott et al.2011, and genetic determinants of malaria risk (e.g. sickle cell trait) modify risk of bacterial infection in malaria-endemic settings 51Scott et al.2011. The observation that, among children with a clinical diagnosis of severe malaria, risk of disease is only modified by rs183868412 in children with a low probability that their disease represents ‘true’ severe malaria (Figure 5) suggests that the effect of genetic variation at BIRC6 on invasive bacterial disease risk operates independently of malaria. In keeping with this, the data best-supports an effect of rs183868412 genotype on bacteraemia risk in children both with and without concomitant parasitaemia (log10 Bayes factor = 2.73, Figure 5C). In addition, unlike sickle cell trait 51Scott et al.2011, the increased risk of invasive bacterial infection conferred by rs183868412:T carriage in the study setting is stable across a period of declining malaria prevalence (Bayes factor = 8.7, Figure 5D).

Evidence of selection pressure and pleiotropy at rs183868412

Common genetic variation associated with a twofold increased risk of invasive bacterial infection in children, in particular across a broad range of pathogens, will be subject to considerable negative selection pressure. The derived allele rs183868412:T, associated with increased risk of invasive bacterial disease in Kenyan children, is absent in non-African populations (https://gnomad.broadinstitute.org). Within Africa, rs183868412:T is present in all nine African populations included in the MalariaGEN consortium project (4Band2019; Table 6), MAFs ranging from 0.011 in The Gambia to 0.034 in Malawi. There is no evidence for within-Africa differentiation at rs183868412 () providing no support for a selective sweep at the locus. We further evaluated evidence for recent directional selection pressure, examining integrated haplotype scores (iHS) 54Voight et al.2006 in seven African populations included in the 1000 Genomes Project. In those data, there is no evidence to support recent selection at the locus (minimum rank , maximum iHS = 1.3). Finally, to understand whether variation at this locus may be maintained in populations through beneficial effects on other traits, we examined evidence for pleiotropy at rs183868412 in a meta-analysis of GWAS data 25Gurdasani et al.2019 of 33 cardio-metabolic traits from Uganda, Ghana, Nigeria, South Africa, and Kenya (sample size ranging from 2671 to 14,126 individuals). In keeping with the MalariaGEN consortium project data, MAFs at rs183868412 range from 0.015 to 0.028. In these data there is no evidence for pleiotropy at the locus, with no evidence for an effect of rs183868412 on any of the 33 traits tested (minimum meta-analysis ). The explanation for the persistence of this polymorphism, therefore, remains an open question.

rs183868412 frequencies in Africa.

Population Number MAF
Gambia 2605 0.011
Mali 183 0.021
Burkina Faso 596 0.009
Ghana 320 0.014
Nigeria 22 0.024
Cameroon 685 0.031
Malawi 1317 0.034
Tanzania 402 0.028
Kenya 1681 0.017

rs183868412 is associated with alternative splicing of BIRC6 in stimulated monocytes

Trait-associated genetic variation identified by GWAS is highly enriched for regulatory variation. The African specificity of the trait-associated variation identified here makes annotation with publicly available regulatory mapping data challenging. To investigate the regulatory function of rs183868412 in immune cells in African populations, we used eQTL Catalogue (https://www.ebi.ac.uk/eqtl) mapping data 36Kerimov et al.2021 from 100 African ancestry individuals in primary monocytes with and without innate stimulation 49Quach et al.2016; influenza A virus, LPS, Pam3CSK4 (synthetic lipoprotein and TLR1/2 agonist) and R848 (a TLR7/8 agonist). In this dataset, rs183868412 is well imputed (), with an MAF of 0.05 (10 heterozygous individuals). We found no evidence for a regulatory effect of rs183868412 at the gene level in monocytes regardless of stimulation. We did, however, observe an effect of rs183868412 genotype on expression of a 12 bp BIRC6 exonic sequence (chr2:32,453,943–32,453,954, ), with evidence for colocalisation of this eQTL with our GWAS signal (posterior probability of colocalisation, PP4 = 0.951, Figure 6). This effect was only observed following stimulation with Pam3CSK4 (Figure 6), with the bacteraemia risk allele, rs183868412:T, being associated with reduced expression of this sequence. That 12 bp sequence is associated with an alternative splicing event that results in extension of a BIRC6 exon. The 23rd exon (ENSE00001189810, chr2:32,453,808–32,453,942) of the canonical BIRC6 transcript, ENST00000421745.6, is 135 bp long and terminates immediately before the 12 bp sequence associated with rs183868412:T genotype. The 22nd exon (ENSE00003835010, chr2:32,453,808–32,453,942) of an alternative BIRC6 transcript, ENST00000648282.1, is 147 bp long, having the same start site but including the 12 bp sequence at its 3’ end. Thus, increased risk of invasive bacterial disease may be associated with decreased expression of an alternative BIRC6 transcript in TLR1/2-stimulated monocytes.

#' @width 20
#' @height 14
library(dplyr)
library(ggplot2)
library(patchwork)
library(RColorBrewer)
library(ggrepel)
library(ggplotify)
library(nnet)
library(coloc)

#Colocalisation of BIRC6 exon eQTL with invasive bacterial disease association at rs183868412
d1 <- read.table("exon_birc6.txt", header = T)
ld <- as.matrix(read.table("ld_for_coloc.txt", row.names = 1, header = T))

eqtl.list <- list(beta = d1$BETA, varbeta = (d1$SE)^2, type = "quant", N = 200, MAF = d1$maf, snp = as.character(d1$SNP), LD=ld)
gwas.list <- list(beta = d1$gwas.beta, varbeta = (d1$gwas.se)^2, type = "cc", N = 5400, s = 0.479, snp = as.character(d1$SNP), LD=ld)
E=runsusie(eqtl.list,coverage=0.01)
G=runsusie(gwas.list, coverage=0.01)
susie.res=coloc.susie(E,G)
h4 <- round(max(susie.res$summary[,8]),2)
hit1 <- as.character(susie.res$summary[which(susie.res$summary[,8]==max(susie.res$summary[,8])),2]$hit1)
hit2 <- as.character(susie.res$summary[which(susie.res$summary[,8]==max(susie.res$summary[,8])),3]$hit2)

#plot colocalsation
exon <- d1

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(11,"Spectral")

exon$bin_r2 <- 1
exon$bin_r2[which(exon$r2>0.1 & exon$r2 <= 0.3)] <- 2
exon$bin_r2[which(exon$r2>0.3 & exon$r2 <= 0.5)] <- 3
exon$bin_r2[which(exon$r2>0.5 & exon$r2 <= 0.7)] <- 4
exon$bin_r2[which(exon$r2>0.7)] <- 5

exon.plot <- ggplot(exon, aes(x=-log10(gwas.p), y=-log10(PVAL))) + 
  geom_point(data=subset(exon, bin_r2==1), color=cols[8], size=3) + 
  geom_point(data=subset(exon, bin_r2==2), color=cols2[5], size=3) + 
  geom_point(data=subset(exon, bin_r2==3), color=cols2[3], size=3) + 
  geom_point(data=subset(exon, bin_r2==4), color=cols2[2], size=3) + 
  geom_point(data=subset(exon, bin_r2==5), color=cols2[1], size=3) +
  ylim(0,5) +
  xlim(0,10) +
  ylab("BIRC6 exon eQTL -log10(p)") + 
  xlab("Bacterial disease GWAS -log10(p)") + 
  annotate("text", x = 0, y = 5, label = paste0("PP4=",h4), size=5, hjust=0) +
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=15), plot.title=element_text(size=20))


#Plot effects of rs184868412 genotype on exon expression in monocytes across stimulation contexts

label <- c("Naive", "IAV", "LPS", "Pam3CSK4", "R848")
mean  <- c(-0.155186,-0.0536451, 0.227218, -0.654215, -0.386583)
se  <- c(0.138402, 0.248336, 0.152519, 0.156258, 0.198497)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

cols3 <- brewer.pal(8, "Set2")
df$label <- factor(df$label, levels=rev(df$label)[c(3,1,2,4,5)])

eqtl.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = c(cols3[c(8,8,3,8,8)])) + scale_colour_manual(values = c(cols3[c(8,8,3,8,8)])) +
  geom_hline(yintercept=0, lty=2) +  
  coord_flip() +  
  ylab("beta") + scale_x_discrete(expand = expand_scale(add = 1)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),plot.title=element_text(size=20))+
  facet_wrap(~facet, ncol=1)+
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))

(eqtl.fp|exon.plot) + plot_annotation(tag_levels = 'A')

Regulatory function of rs183868412 monocytes.

(A) Betas and 95% confidence intervals of rs183868412 association with expression of a 12 bp BIRC6 exonic sequence (chr2:32,453,943–32,453,954) in monocytes. Monocytes are naive or stimulated with LPS (lipopolysaccharide), IAV (influenza A virus), Pam3CSK4, and R848. (B) Colocalisation of the BIRC6 exon eQTL in Pam3CSK4-stimulated monocytes colocalises with the risk locus for invasive bacterial disease (). SNPs are coloured according to linkage disequilibrium to rs183868412. Effect estimates are calculated by linear regression.

Discussion

In this study, we have leveraged the close relationship between P. falciparum infection and bacteraemia in African children 51Scott et al.2011 to perform a GWAS of invasive bacterial infection in 5400 Kenyan children. We approached this by defining the probability with which each critically unwell child with a clinical diagnosis of severe malaria has a disease process directly mediated by malaria, that is ‘true’ severe malaria. We hypothesised that critically unwell children, with a low probability of having ‘true’ severe malaria, are enriched for invasive bacterial infections. We explored the validity of this approach, demonstrating that children with a low probability of ‘true’ severe malaria were indeed enriched for culture-proven bacteraemia and were at a higher risk of death than children with a higher probability. We therefore performed a GWAS weighting cases according to their likelihood of invasive bacterial disease. In doing so, we have identified and validated BIRC6 as a novel genetic susceptibility locus for all-cause invasive bacterial disease in Kenyan children.

The disease-associated locus modifies risk of invasive bacterial disease caused by a broad range of pathogens, including β-haemolytic streptococci, E. coli, nontyphoidal Salmonella, Streptococcus pneumoniae, and S. aureus. Moreover it modifies risk of invasive infection in both the neonatal period and in older children. Furthermore, in contrast to the rs334 HBB A>T mutation 51Scott et al.2011, rs183868412 modifies risk of invasive bacterial disease in a manner independent of malaria, with rs183868412:T carriage increasing risk of disease across a period of falling malaria transmission and in children with and without concurrent parasitaemia.

We further demonstrate that rs183868412 mediates risk of invasive bacterial disease through the modification of BIRC6 splicing in Pam3CSK4-stimulated monocytes. BIRC6 (Baculovirus inhibitor of apoptosis protein repeat containing 6), also known as BRUCE (BIR repeat containing ubiquitin-conjugating enzyme), encodes a large member of the inhibitor of apoptosis protein (IAP) family 27Hauser et al.1998. Members of the IAP family bind to cognate caspases, inhibiting their activity, and thereby cell death, through occlusion of their active site 53Verhagen et al.2001. A proportion of IAPs also contain an E3 ubiquitin ligase allowing both direct inhibition of caspases and targeting of caspases for proteasomal degradation 53Verhagen et al.2001. BIRC6 contains both inhibitor of apoptosis domains and an E2/E3 ubiquitin ligase, which function to inhibit apoptosis in response to a variety of stimuli, both by interaction with and degradation of caspase-9, but also through the ubiquitination and degradation of SMAC, an IAP antagonist 26Hao et al.20045Bartke et al.2004. BIRC6 also regulates autophagosome-lysosome fusion 19Ebner et al.2018, and ubiquitinates (and targets for degradation) LC3, a key effector of autophagosome formation. Thus, BIRC6 also acts as a negative regulator of autophagy 35Jia and Bonifacino2019.

It is highly plausible that BIRC6 could determine susceptibility to invasive infection through either its regulation of apoptosis or autophagy (or both). Sepsis induces marked changes in apoptosis across a range of immune cells 32Hotchkiss and Nicholson2006. There is markedly enhanced apoptosis in both dendritic cells 31Hotchkiss et al.2002 and in lymphocytes. Enhanced lymphocyte apoptosis is most striking in B cells and CD4+ T cells 30Hotchkiss et al.2001 which, at least in part, is mediated by caspase-9. The consequent lymphopoenia is correlated with both severity of sepsis and outcome 39Le Tulzo et al.2002. In addition to the direct effects of immune cell loss on the innate and adaptive immune responses to invasive infection, sepsis-induced apoptosis induces immune cell dysfunction, phagocytosis of apoptotic cells resulting in reduction in pro-inflammatory cytokine production, and cross-presentation of antigen from apoptotic cells to adaptive immune cells 2Albert2004. In keeping with a role for regulators of apoptosis in the pathogenesis of sepsis, members of the IAP family, including NAIP/BIRC1 and BIRC3, are downregulated in immune cells in patients with sepsis, as is the BIRC6 ubiquitination target SMAC029Hoogerwerf et al.2010. Autophagy contributes to the direct removal of intracellular pathogens and, through the degradation of invading organisms in autophagosomes, directs antigen presentation and pro-inflammatory cytokine secretion 17Deretic et al.2013. As above, BIRC6 regulates autophagy through its interaction with LC3, and overexpression of LC3B limits inflammation and tissue injury in a mouse model of sepsis 41Lo et al.2013.

In keeping with a role for BIRC6 in autophagy and apoptosis in sepsis, our data identify a role for genetic variation at BIRC6 in determining risk of invasive infection secondary to a broad range of bacteria. This is in contrast to previously published data describing susceptibility to invasive bacterial infection, which has highlighted a prominent role for genetic risk factors that are specific to single pathogens 15Davila et al.201024Gilchrist et al.201850Rautanen et al.2016. In this study, the derived allele (T) at rs183868412 was associated with increased risk of bacteraemia secondary to gram-positive (β-haemolytic streptococci, S. pneumoniae, and S. aureus) and gram-negative (E. coli, nontyphoidal Salmonella) pathogens, including intracellular and extracellular bacteria, and enteric and respiratory pathogens. Moreover, rs183868412 modified risk of bacteraemia in both the neonatal period, when infection is likely to be maternally derived, and in older children, when sources of community acquired infection will be more diverse. This modulation of invasive bacterial disease risk, despite diverse sources and routes of infection and diverse mechanisms of invasion, suggests a mechanism in which genetic variation at BIRC6 modifies risk of invasive infection downstream of initial mechanisms of infection and invasion. In an interesting parallel to this, common genetic variation at another ubiquitin-conjugating enzyme, UBE2U, has been shown to modify outcome in meningitis secondary to diverse pathogens in individuals of European ancestry 40Lees et al.2019.

Our study has some important limitations. The African-specific nature of the trait-associated variation identified here limits our ability to comprehensively interrogate the effect of that variation in immune cells. The eQTL mapping data that we utilise here 49Quach et al.2016 is limited in that it allows us only to consider a single immune cell type. It is also important to note that there are relatively few African-ancestry individuals in the eQTL mapping data we utilise here, and the splicing signal at BIRC6 is based on only 10 individuals heterozygous for rs183868412:T. A more complete understanding of the role played by genetic variation at BIRC6 plays in the pathogenesis of sepsis in African children will require more detailed expression and functional studies in African populations. In addition, there is a paucity of large-scale genetic association studies performed in African populations. The African-specific nature of the variation identified in our study therefore limits our ability to explore pleiotropic effects at this locus. Allied to this, the low MAF of rs183868412 will result in very limited power to detect selection events at the locus. Larger and richer datasets detailing genetic variation in African populations will be required to explore the wider phenotypic consequences of variation at BIRC6. Our study uses plasma Pf_HRP2 concentrations to help identify children at low risk of ‘true’ severe malaria. Given the increasing frequency of _pfhrp2 and pfhrp3 deletions in many settings, including in Africa 1Agaba et al.201920Feleke et al.202122Gamboa et al.2010, an understanding of the local prevalence of pfhrp2/3 deletions will be important in considering how to translate this model to other settings.

Taken together, our data identify a role for BIRC6 in the pathogenesis of invasive bacterial infections in Kenyan children. By maximising our available sample size to include children with a high likelihood of invasive bacterial infection, but without culture-confirmed infection, we facilitate novel variant discovery and reveal a common genetic architecture of invasive bacterial disease secondary to diverse pathogens. In doing so, we expand our understanding of the biology of invasive infection in African children. In particular, these data inform our understanding of the biology shared by diverse bacterial infections causing a common clinical syndrome: sepsis.

Materials and methods

Study samples

Recruitment of the severe malaria cases, bacteraemia cases, and healthy controls have been described in detail elsewhere 45Ndila et al.201850Rautanen et al.2016. In brief, children under 14 years admitted to the high dependency ward of Kilifi County Hospital with a clinical diagnosis of severe malaria, defined as P. falciparum parasites on blood film and at least one of reduced Blantyre Coma Score, severe anaemia (Hb < 50 g/L), evidence of respiratory distress, hypoglycaemia, or hyperparasitaemia were eligible for recruitment as cases of severe malaria. During the study period, all children admitted to hospital, with the exception of elective surgical admissions and minor trauma, had a blood sample taken for bacterial culture (BACTEC 9050 instrument, Becton Dickinson, Franklin Lakes, NJ). Children under 14 years in whom a pathogenic organism was identified in blood were eligible for study inclusion (Bacillus species, coryneform bacteria, coagulase-negative staphylococci, Staphylococcus saprophyticus, and Viridans group streptococci were considered contaminants). Control children were recruited between 3 and 12 months of age from consecutive live births from the population which Kilifi County Hospital serves. Control children have been subject to longitudinal follow-up. Following explanation of the study, written informed consent was obtained from the parent or guardian of each child included in the study. Ethical approval was obtained from the Kenya Medical Research Institute (KEMRI) National Scientific Steering and Research Committees (approval numbers: SCC1192 and SCC967) and the Oxford Tropical Research Ethics Committee (OxTREC, approval numbers: 020–06 and 014–01).

Models to define the probability of ‘true’ severe malaria

Among cases of severe malaria recruited to the study, we used probabilistic models to assign a probability of that child’s clinical presentation being mediated by parasite sequestration, that is ‘true’ severe malaria, as described previously 57Watson et al.202158Watson et al.2021. Where available (n=1400), we used platelet counts and _Pf_HRP2 concentrations to derive the probabilistic model. In cases where _Pf_HRP2 concentration was not measured (n=800), we used white blood cell counts and platelet counts as input data to the model.

For both models (Model 1: _Pf_HRP2/platelet counts; Model 2: platelet counts/white blood cell counts), the probabilities of ‘true’ severe malaria were derived by fitting parametric latent class models. These assumed that each patient had a binary latent state (true severe malaria versus not severe malaria). For Model 1, we assumed that in each latent state the data were distributed as a single bivariate normal distribution. For Model 2, the data did not fit well to a two-component bivariate normal (white blood cell counts have much weaker diagnostic value) so we assumed that the data had bivariate Student’s t-distribution for the severe malaria state, and a flexible mixture of bivariates for the not severe malaria state.

Genotyping and imputation

For the discovery analysis, we utilised genotypes generated as part of GWASs of severe malaria 4Band2019 and bacteraemia 50Rautanen et al.2016 previously reported in this population. Bacteraemia cases and controls were genotyped using the Affymetrix SNP 6.0 array and the severe malaria samples using the Illumina Omni 2.5M platform. SNP and sample quality control for both datasets are highly analogous, and have been described previously 4Band201950Rautanen et al.2016. In brief, MalariaGEN SNP QC excluded poorly genotyped SNPs using the following metrics; SNP missingness >2.5%, MAF <1%, Hardy-Weinberg equilibrium (HWE) , plate effect , and a recall test quantifying changes in genotype following a re-clustering process . For Affymetrix-genotyped samples, SNP QC excluded poorly performing SNPs using the following metrics: SNP missingness > 2%, MAF < 1%, genotype probability (info) < 0.975, plate effect , and HWE . Sample QC on both platforms excluded sample outliers with respect to channel intensity, missingness, heterozygosity, population outliers, and duplicated samples (relatedness coefficient >0.75). In addition, for Affymetrix-genotyped samples, samples were excluded in cases of discordant sex as recorded in the clinical record and imputed from mean intensities from X and Y chromosome markers.

To facilitate combining datasets, we applied an additional set of cross-platform QC procedures. We defined a shared subset of SNPs genotyped and passing SNP QC on both platforms (n=167,108), observing high levels of genotype concordance (median concordance 0.993, Figure 3—figure supplement 4) among the subset of samples genotyped on both platforms (n=1365). We used this shared SNP set to compute relatedness estimates and PCs in PLINK v1.90 10Chang et al.2015. The major six PCs of shared genotypes differentiate self-reported ethnicity (Figure 3—figure supplement 1) and are non-differential with respect to genotyping platform (Figure 3—figure supplement 1). To harmonise QC across both platforms, we excluded MalariaGEN samples with discordant clinical and genetic sex (n=136). We further excluded one of duplicate or related sample pairs (relatedness coefficient >0.2) across platforms, retaining case samples where possible and excluding equal numbers of control samples genotyped on each platform (n=1973). Following QC, genotypes were phased using SHAPEIT2 16Delaneau et al.2011, and untyped genotypes imputed genome-wide using IMPUTE2 (v2.3.2) 34Howie et al.201133Howie et al.2009 with 1000 Genomes Phase III as a reference panel. Following imputation, we excluded SNPs with imputation info scores < 0.5, MAF < 1% and HWE , applying each threshold for each platform and overall. Following SNP and sample QC, 14,010,600 autosomal SNPs and indels from 5400 samples (1445 bacteraemia cases, 1143 severe malaria cases, and 2812 control samples: 917 Illumina genotyped and 1895 Affymetrix genotyped) were taken forward as a combined discovery dataset for association analysis. Following QC and association analysis, we identified a further set of population outliers using ABERRANT 7Bellenguez et al.2012 for downstream sensitivity analysis.

Estimation of SNP heritability

To estimate SNP heritability of bacteraemia in this population, we used genome-wide genotyping data from culture-confirmed bacteraemia cases and healthy controls genotyped with the Affymetrix SNP 6.0 array. For this analysis we used directly typed markers passing quality control as described above (n = 783,094). In addition to the sample quality control applied above, we additionally excluded one individual from each sample pair with relatedness coefficient >0.05, leaving a final sample size of 2559 (1042 cases, 1517 controls). We estimated SNP heritability using GCTA-GREML 60Yang et al.2011. For transformation of the heritability estimate to the liability scale, we assumed a population prevalence for bacteraemia of 2% in this setting.

Association analysis and fine mapping

In the discovery analysis, we tested for association between genotype at each variant passing QC and invasive bacterial disease by logistic regression in an additive linear model. We used weighted logistic regression to reflect the probability of each case sample being a ‘true’ case of invasive bacterial infection. Cases of culture-confirmed bacteraemia were assigned a weight of 1, whereas cases of severe malaria were assigned weights of 1 − P(SM|Data), re-weighting the contribution of a case to the log-likelihood according to its probability of representing invasive bacterial infection. Control samples were assigned a weight of 1. Our regression thus assumes that the lower the probability of ‘true’ severe malaria, the greater the probability that a case represents culture-negative invasive bacterial disease. To control for confounding variation, we included the six major PCs of genotyping data and genotyping platform as covariates in the model. Weighted logistic regression was implemented using the glm function in R. As described previously 57Watson et al.2021, standard errors were transformed to reflect the reduced effective sample size resulting from inclusion of sample weights in the model. We considered to be significant.

We used a Bayesian approach to identify a set of SNPs with 95% probability of containing the causal variant at the trait-associated locus. Approximate Bayes factors 56Wakefield2009 were calculated for each SNP in the region (a 200 kb surrounding rs183868412) with a prior distribution of . All SNPs were considered equally likely to be the causal variant a priori. A set of SNPs with 95% probability of containing the causal SNP was defined as the smallest number of SNPs for which the summed posterior probabilities exceed 0.95.

Replication samples and analysis

To replicate our findings from the discovery analysis, we used a second sample set, recruited from the same population as the discovery samples. Replication case samples were cases of bacteraemia only, and did not include cases of severe malaria without culture-confirmed bacterial infection. Case samples were recruited between 1 August 1998 and 30 October 2010. As for the discovery case samples, children under 14 years with a bacterial pathogen isolated from blood on admission to hospital were eligible for recruitment to the study. As above, control samples were recruited as part of a birth cohort from the same population, with children recruited between the ages of 3 and 12 months. Genotyping and QC procedures for these samples have been described previously. In brief, study samples were genotyped using the Immunochip Consortium 12Cortes and Brown2011 array (Illumina). Sample QC was performed as for the discovery samples (above), with duplicate control samples (samples common to MalariaGEN and Immunochip controls, n=78) being removed from the replication set. As above, relatedness estimates and PCs were computed in PLINK v1.90 (10Chang et al.2015; Figure 4—figure supplement 1). SNP QC excluded the following variants: SNP missingness > 1%, MAF < 1%, and HWE . Following QC, 143,000 genotyped variants in 434 cases and 1258 control samples were taken forward for imputation. As above, imputation was performed with SHAPEIT2 16Delaneau et al.2011 and IMPUTE2 (v2.3.2) 34Howie et al.201133Howie et al.2009 with 1000 Genomes Phase III as a reference panel.

Following imputation, we further excluded poorly imputed (imputation info score < 0.5) and rare (MAF < 1%) variants and variants with HWE . At variants associated with invasive bacterial disease () in the discovery analysis, we tested for association with bacteraemia case-control status using logistic regression in an additive model in SNPTEST v2.5.6 44Marchini et al.2007. To exclude confounding variation, we included the major six PCs of genotyping data in the model. We considered evidence of association with bacteraemia in the replication samples with with the same direction of effect as in the discovery analysis to be significant.

Bayesian comparison of models of association

At the locus of interest, we used multinomial logistic regression, implemented in SNPTEST v2.5.6 44Marchini et al.2007 to estimate the additive effect of genotype on risk of bacteraemia stratified by pathogen, bacteraemia in the neonatal and non-neonatal periods, bacteraemia with and without P. falciparum parasitaemia, and bacteraemia presenting at different time periods across a period of declining malaria transmission intensity. For these analyses we used only samples with culture-confirmed bacteraemia. In each case we used control status as the baseline stratum, and included the six major PCs of genotyping data to control for confounding variation as above.

For the pathogen-stratified analysis, we defined eight case strata among the discovery cases, one for each of the seven most commonly isolated organisms (Acinetobacter, n=118; β-haemolytic streptococci, n=130; E. coli, n=141; Haemophilus influenzae type b, n=113; nontyphoidal Salmonella, n=159; pneumococci, n=390; S. aureus, n=152) and one stratum for the remaining other organisms (n=242). For the neonatal/non-neonatal disease analysis, we stratified cases as presenting in the first 28 days of life (n=195) or beyond that (n=1245). For the analysis comparing bacteraemia with and without malaria, we stratified cases with (n=204) and without (n=1236) P. falciparum on their admission blood film. For each of these analysis, case strata were compared to Affymetrix-genotyped discovery control samples (n=1895) as a baseline stratum.

For the analysis stratified across year of admission, we defined case strata by grouping into six time periods according to their date of admission: 1998–2000 (n=498), 2001–2002 (n=349), 2003–2004 (n=467), 2005–2006 (n=310), 2007–2008 (n=237), 2009–2010 (n=111). For this analysis we used both discovery (Affymetrix) and replication (Immunochip) case and control samples. This allowed better coverage of the years later in the study, which were underrepresented in the discovery samples (the discovery median admission year is 2003, cf 2005 for the replication samples). In that analysis we used multinomial logistic regression in each cohort to estimate stratum-specific effects, combining these results in a fixed effects meta-analysis using BINGWA 43Malaria Genomic Epidemiology Network et al.2015.

We then compared models of association using a Bayesian approach 50Rautanen et al.2016, considering the following models:

Additional models consider each possible combination of a fixed effect size for associated strata and no association at other strata. For each model, we calculated approximate Bayes factors 56Wakefield2009 and posterior probabilities, assuming each model to be equally likely a priori. Statistical analysis was performed in R.

eQTL mapping and colocalisation analysis

We used the eQTL Catalogue 36Kerimov et al.2021 mapping pipeline (https://github.com/eQTL-Catalogue/, 37Kerimov2022) to map eQTL in naive and stimulated monocytes 49Quach et al.2016. These data include bulk RNA-seq and genome-wide genotyping data from naive and stimulated primary monocytes isolated from individuals of European (n=100) and African (n=100) ancestry 49Quach et al.2016. Given the African-specific nature of variation at rs183868412, we performed eQTL mapping in this dataset restricting our analysis to samples of African ancestry. The eQTL Catalogue mapping pipeline has been described previously 36Kerimov et al.2021. In brief, sample genotypes (Illumina HumanOmni5-Quad genotyped) were pre-phased with Eagle v.2.4.1 42Loh et al.2016 and genotypes imputed with Minimac4 v.1.0.2 14Das et al.2016 using 1000 Genomes Phase III as a reference panel. Gene expression, exon expression, transcript usage, and transcriptional event usage were quantified from RNA-seq data using HISAT2 v.2.1.0 38Kim et al.2019, DEXSeq v.1.18.4 3Anders et al.2012, and Salmon v.0.13.1 47Patro et al.2017. Nominal mapping in cis was performed for each phenotype for variants within a 1 Mb window of the start of each gene using FastQTL 46Ongen et al.2016, including six PCs of genotyping and phenotype data as covariates in the model.

We then used the R package coloc v5.1.0 23Giambartolomei et al.2014 to identify evidence of causal variants shared by our bacterial disease-associated locus of interest and regulatory genetic variation identified in our eQTL mapping data. Coloc adopts a Bayesian approach to compare evidence for independent or shared association signals for two traits at a given genetic locus. We used the coloc.susie() command to allow colocalisation of multiple independent signals at a single locus for each trait. We considered evidence for colocalisation for each gene and exon within a 250 kb window of the peak association (rs183868412). We considered a posterior probability >0.8 supporting a shared causal locus to be significant.

References

    Systematic review of the status of pfhrp2 and pfhrp3 gene deletion, approaches and methods used for its estimation and reporting in Plasmodium falciparum populations in Africa: review of published studies 2010-201918Malaria Journal
    Death-defying immunity: do apoptotic cells influence antigen processing and presentation?4Nature Reviews. Immunology223231
    Detecting differential usage of exons from RNA-seq data22Genome Research20082017
    Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania10Nature Communications
    Dual role of BRUCE as an antiapoptotic IAP and a chimeric E2/E3 ubiquitin ligase14Molecular Cell801811
    Defining childhood severe falciparum malaria for intervention studies4PLOS Medicine
    A robust clustering algorithm for identifying problematic samples in genome-wide association studies28Bioinformatics134135
    Bacteremia among children admitted to a rural hospital in Kenya352The New England Journal of Medicine3947
    Mendelian susceptibility to mycobacterial disease: genetic, immunological, and clinical features of inborn errors of IFN-γ immunity26Seminars in Immunology454470
    Second-generation PLINK: rising to the challenge of larger and richer datasets4GigaScience
    Invasive bacterial co-infection in African children with Plasmodium falciparum malaria: a systematic review12BMC Medicine
    Promise and pitfalls of the Immunochip13Arthritis Research & Therapy
    Effectiveness of Haemophilus influenzae type b Conjugate vaccine introduction into routine childhood immunization in Kenya296JAMA671678
    Next-generation genotype imputation service and methods48Nature Genetics12841287
    Genome-wide association study identifies variants in the CFH region associated with host susceptibility to meningococcal disease42Nature Genetics772776
    A linear complexity phasing method for thousands of genomes9Nature Methods179181
    Autophagy in infection, inflammation and immunity13Nature Reviews. Immunology722737
    The effect of antibiotic exposure and specimen volume on the detection of bacterial pathogens in children with pneumonia64Clinical Infectious Diseases
    The IAP family member BRUCE regulates autophagosome-lysosome fusion9Nature Communications
    Plasmodium falciparum is evolving to escape malaria rapid diagnostic tests in Ethiopia6Nature Microbiology12891299
    Complement deficiency states and meningococcal disease12Immunologic Research295311
    A large proportion of P. falciparum isolates in the Amazon region of Peru lack pfhrp2 and pfhrp3: implications for malaria rapid diagnostic tests5PLOS ONE
    Bayesian test for colocalisation between pairs of genetic association studies using summary statistics10PLOS Genetics
    Risk of nontyphoidal Salmonella bacteraemia in African children is modified by STAT49Nature Communications
    Uganda genome resource enables insights into population history and genomic discovery in africa179Cell9841002
    Apollon ubiquitinates SMAC and caspase-9, and has an essential cytoprotection function6Nature Cell Biology849860
    A giant ubiquitin-conjugating enzyme related to IAP apoptosis inhibitors141The Journal of Cell Biology14151422
    Diagnosing severe falciparum malaria in parasitaemic african children: A prospective evaluation of plasma PfHRP2 measurement9PLOS Medicine
    Gene expression profiling of apoptosis regulators in patients with sepsis2Journal of Innate Immunity461468
    Sepsis-induced apoptosis causes progressive profound depletion of B and CD4+ T lymphocytes in humans166Journal of Immunology69526963
    Depletion of dendritic cells, but not macrophages, in patients with sepsis168Journal of Immunology24932500
    Apoptosis and caspases regulate death and inflammation in sepsis6Nature Reviews. Immunology813822
    A flexible and accurate genotype imputation method for the next generation of genome-wide association studies5PLOS Genetics
    Genotype imputation with thousands of genomes1G3: Genes, Genomes, Genetics457470
    Negative regulation of autophagy by UBA6-BIRC6-mediated ubiquitination of LC38eLife
    A compendium of uniformly processed human gene expression and splicing quantitative trait loci53Nature Genetics12901299
    https://github.com/eQTL-Catalogue/
    Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype37Nature Biotechnology907915
    Early circulating lymphocyte apoptosis in human septic shock is associated with poor outcome18Shock487494
    Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis10Nature Communications
    Lc3 over-expression improves survival and attenuates lung injury through increasing autophagosomal clearance in septic mice257Annals of Surgery352363
    Reference-based phasing using the Haplotype Reference Consortium panel48Nature Genetics14431448
    A novel locus of resistance to severe malaria in A region of ancient balancing selection526Nature253257
    A new multipoint method for genome-wide association studies by imputation of genotypes39Nature Genetics906913
    Human candidate gene polymorphisms and risk of severe malaria in children in Kilifi, Kenya: a case-control association study5The Lancet. Haematology
    Fast and efficient QTL mapper for thousands of molecular phenotypes32Bioinformatics14791485
    Salmon provides fast and bias-aware quantification of transcript expression14Nature Methods417419
    Inherited human IRAK-4 deficiency: an update38Immunologic Research347352
    Genetic adaptation and neandertal admixture shaped the immune system of human populations167Cell643656
    Polymorphism in a lincrna associates with a doubled risk of pneumococcal bacteremia in kenyan children98American Journal of Human Genetics10921100
    Relation between falciparum malaria and bacteraemia in Kenyan children: a population-based, case-control study and a longitudinal study378Lancet13161323
    Effect of 10-valent pneumococcal conjugate vaccine on the incidence of radiologically-confirmed pneumonia and clinically-defined pneumonia in Kenyan children: an interrupted time-series analysis7The Lancet. Global Health
    Inhibitor of apoptosis proteins and their relatives: IAPs and other BIRPs2Genome Biology
    A map of recent positive selection in the human genome4PLOS Biology
    Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019396Lancet12041222
    Bayes factors for genome-wide association studies: comparison with P-values33Genetic Epidemiology7986
    Improving statistical power in severe malaria genetic association studies by augmenting phenotypic precision10eLife
    Improving the Diagnosis of Severe Malaria in African Children Using Platelet Counts and Plasma Pf HRP2 ConcentrationsbioRxiv
    Bacteraemia in Kenyan children with sickle-cell anaemia: a retrospective cohort and case-control study374Lancet13641370
    GCTA: a tool for genome-wide complex trait analysis88American Journal of Human Genetics7682