Monocyte-derived transcriptome signature indicates antibody-dependent cellular phagocytosis as a potential mechanism of vaccine-induced protection against HIV-1

  1. US Military HIV Research Program (MHRP), Walter Reed Army Institute of Research
    Silver SpringUnited States
  2. Henry M. Jackson Foundation for the Advancement of Military Medicine
    BethesdaUnited States
  3. Department of Pathology and Laboratory Medicine, Emory University
    AtlantaUnited States
  4. Vaccine Research Center, NIH
    BethesdaUnited States
  5. Departments of Surgery, Immunology and Molecular Genetics and Microbiology, Duke University School of Medicine
    DurhamUnited States
  6. Vaccine Trial Centre, Faculty of Tropical Medicine, Mahidol University
    BangkokThailand
  7. Armed Forces Research Institute of Medical Sciences
    BangkokThailand
  8. Research Institute for Health Sciences and Faculty of Public Health, Chiang Mai University
    Chiang MaiThailand
  9. Department of Disease Control, Ministry of Public Health
    NonthaburiThailand
  10. Fred Hutchinson Cancer Research Center
    SeattleUnited States

Abstract

A gene signature was previously found to be correlated with mosaic adenovirus 26 vaccine protection in simian immunodeficiency virus and simian-human immunodeficiency virus challenge models in non-human primates. In this report, we investigated the presence of this signature as a correlate of reduced risk in human clinical trials and potential mechanisms of protection. The absence of this gene signature in the DNA/rAd5 human vaccine trial, which did not show efficacy, strengthens our hypothesis that this signature is only enriched in studies that demonstrated protection. This gene signature was enriched in the partially effective RV144 human trial that administered the ALVAC/protein vaccine, and we find that the signature associates with both decreased risk of HIV-1 acquisition and increased vaccine efficacy (VE). Total RNA-seq in a clinical trial that used the same vaccine regimen as the RV144 HIV vaccine implicated antibody-dependent cellular phagocytosis (ADCP) as a potential mechanism of vaccine protection. CITE-seq profiling of 53 surface markers and transcriptomes of 53,777 single cells from the same trial showed that genes in this signature were primarily expressed in cells belonging to the myeloid lineage, including monocytes, which are major effector cells for ADCP. The consistent association of this transcriptome signature with VE represents a tool both to identify potential mechanisms, as with ADCP here, and to screen novel approaches to accelerate the development of new vaccine candidates.

Introduction

The only HIV vaccine in ongoing human efficacy trials employs an adenovirus serotype 26 (Ad26)-based vector, but tests a different route of infection and geographic population compared to African women in whom it failed to show efficacy 30Mega201935NIH201737NIH2021. Vaccines from these trials were previously tested in non-human primates (NHPs) and showed partial protection from infection 8Barouch et al.20159Barouch et al.2018. To date, the pivotal RV144 phase three human efficacy trial conducted in Thailand is the only vaccine to show any protection against HIV 43Rerks-Ngarm et al.2009. This vaccine used a canary-pox ALVAC-based vector with a bivalent gp120 protein boost. Although neither of the vaccine regimens using Ad26 or canary-pox viral vectors were fully efficacious, there is some consensus that current preventive and treatment methods along with a moderately effective vaccine could potentially reduce the HIV pandemic 4Anderson et al.19955Andersson et al.200717Fauci201729Medlock et al.2017. While a number of correlates of vaccine protection have been described for these studies, protection mediated by the humoral immune systems, including HIV-1 specific IgG antibody titers, antibody Fc polyfunctionality, antibody interactions with HLA class II gene products, and antibody effector functions, has been key features of these partially effective vaccines 8Barouch et al.20159Barouch et al.201824Haynes et al.201241Prentice et al.2015.

We previously showed that a vaccine-induced gene signature identified in B cells by an unbiased transcriptome-wide RNA-seq approach associated with decreased risk against simian immunodeficiency virus (SIV)/simian-human immunodeficiency virus (SHIV) infection in NHP studies evaluating the Ad26 vaccine 16Ehrenberg et al.2019. This geneset was also enriched in NHP and the human RV144 trial that employed a vaccine containing the ALVAC viral vector 16Ehrenberg et al.2019. This gene signature is not merely a general response elicited by vaccination, as it was not enriched in the Ad26-MVA arm of the SHIV challenge in NHP that showed some protection previously 9Barouch et al.2018. This gene signature was initially defined when comparing differentially expressed genes (DEGs) between B cells and monocytes from vaccinated individuals in an Influenza immunogenicity trial 33Nakaya et al.2011. The geneset that was submitted to the molecular signature database (MSigdb) comprised the top 200 genes that were upregulated in monocytes compared to B cells. In our previous study, specific genes in this geneset were upregulated in the uninfected compared to the infected group in multiple SIV/HIV trials 16Ehrenberg et al.2019. Genes that were previously correlated with immunogenicity in human vaccine trials of influenza and yellow fever, including TNFSF13 (APRIL), were also enriched in uninfected rhesus monkeys in the NHP studies 28Li et al.201433Nakaya et al.2011. Although we first identified this geneset in sorted B cells 16Ehrenberg et al.2019, we were able to identify the same signature associating with reduced infection in published microarray data sets from both bulk unstimulated and in vitro antigen-stimulated PBMC from three independent preclinical and clinical studies 19Fourati et al.201956Vaccari et al.201855Vaccari et al.2016. To determine if specific immune responses might be driving protection in conjunction with the gene signature, we examined whether this geneset associated with these responses measured in the NHP studies. We observed that this gene signature was also enriched in animals with the increased magnitude of ADCP 16Ehrenberg et al.2019. We propose that this gene signature is a correlate of reduced risk of infection in efficacy studies and that further investigation of the enriched genes in the geneset could potentially help uncover the mechanism of vaccine protection. Here, we investigate this gene signature further as a proxy of vaccine-induced protection in human clinical trials to identify the cellular origin, as well as to investigate potential mechanisms for the decreased risk of infection.

Results

Gene signature is absent in a human HIV vaccine trial that did not show efficacy

Since the gene signature associated with protection within the vaccinated group in multiple studies from different sources and regimens, we wanted to further confirm that this signature was truly associated with protection by looking for its presence or absence in a human vaccine trial that failed to show efficacy (Supplementary file 1a). We screened for this gene signature in whole-transcriptome data within participants vaccinated with the DNA/rAd5 HIV-1 preventive vaccine in the HVTN 505 human efficacy trial. Immunizations in this trial were halted prior to reaching the clinical endpoint due to lack of efficacy 23Hammer et al.2013. When comparing infection status within vaccinated individuals, enrichment of this gene signature, as defined by the normalized enrichment score (NES), was not significant in transcriptome data from sorted B cells or monocytes 1 month after the final immunization (NES=–1.18, p=0.09 and NES=1.12, p=0.18, respectively). This finding further supports our hypothesis that this gene signature is associated with protection, as summarized in Table 1.

Gene signature associates with vaccine protection in multiple trials.

Study Vaccine regimen Species Partial protection N Method Protective signature
09–11 Ad26/gp140 NHP Y 10 RNA-seq Y
13–19 Ad26/gp140 NHP Y 11 RNA-seq Y
13–19 A26/Ad26+ gp140 NHP Y 12 RNA-seq Y
13–19 Ad26/MVA+ gp140 NHP Y 9 RNA-seq N
ALVAC-SIV/gp120 NHP Y 27 Microarray Y
DNA-SIV/ALVAC+ gp120 NHP Y 12 Microarray Y
RV144 ALVAC/gp120 Human Y 170 Microarray Y
HVTN 505 DNA/rAd5 Human N 42 RNA-seq N

Genes from this signature are the strongest correlate of protection in RV144

In previous analyses of NHP preclinical studies, we utilized a composite gene expression score (GES) consisting of an average of standardized expression of the specific number of enriched genes in one study to predict infection status in an independent study using the overlapping expressed genes from the first study 16Ehrenberg et al.2019. While this method is successful in evaluating gene signatures in studies using similar vaccine strategies, we wanted to explore this approach across different studies using diverse platforms. For each independent study, we computed a GES derived from genes within the geneset that were enriched in uninfected donors by averaging standardized expression and showed that it associates with decreased HIV-1 infection (Figure 1). The magnitude of the GES and total number of enriched genes present in the gene signature are specific to each study and are higher in the uninfected compared to infected animals in the two NHP preclinical trials evaluating the mosaic Ad26 vaccine (09–11 and 13–19), including the different arms of the 13–19 study (13–19 a-b) (Figure 1A–C). Further, in the RV144 trial, the GES of 63 enriched genes in the gene signature also was higher in the vaccinated individuals that remained uninfected (Figure 1D). The number of enriched genes in each study might vary due to global differences in the vaccine strategies, but we consistently observed that higher GES associated with protection from HIV acquisition. We took advantage of the composite GES measurement to compare it with the other known primary correlates of HIV-1 infection risk in the human RV144 trial. IgG antibodies binding to the variable regions 1–2 (V1V2) of the HIV-1 Envelope (Env) have been shown to correlate with decreased risk of infection, while IgA binding to Env associated with increased risk of infection 24Haynes et al.2012. We show that the association of the GES in RV144 is a stronger correlate of reduced risk of infection than the previously described V1V2-specific IgG antibodies (Figure 2A). Cumulative incidence curves of HIV-1 infection showed decreased rates of infection among vaccine recipients with high GES (Figure 2B). Estimated vaccine efficacy (VE) was higher among vaccine recipients with higher GES (Figure 2C). The distribution of area under the receiver operating characteristic curve (AUC) and accuracy suggested that GES was also able to predict HIV-1 infection (Figure 2D). The effect of GES was also tested in RV144 vaccine and placebo participants who became infected during the trial 45Rolland et al.2012. If the GES was associated with VE, we would expect that vaccinees with a high GES would not get infected, hence vaccinees who became infected should have lower GES than placebo participants (who reflect the entire distribution of GES). This was observed across 43 breakthrough participants, with a significant difference among participants infected with single HIV-1 founder variants (N=29) (Figure 2—figure supplement 1). These findings strengthen the hypothesis that the GES is associated with VE.

Composite gene expression scores (GES) are higher in the uninfected compared to infected groups.

GES computed from enriched genes in the geneset is higher in the uninfected compared to infected vaccinated NHP and humans. (A) Ad26/gp140 (09–11 NHP SIV challenge study, 58 enriched genes, N=10), (B) Ad26/gp140 (13–19 NHP SHIV challenge study, 58 enriched genes, N=11), (C) Ad26/Ad26+ gp140 (13–19 NHP SHIV challenge study, 68 enriched genes, N=12), and (D) ALVAC/gp 120 (RV144 human efficacy trial, 63 enriched genes, N=170). Statistical significance was calculated by either Mann-Whitney or unpaired t-test. NHP, non-human primate; SHIV, simian-human immunodeficiency virus; SIV, simian immunodeficiency virus.

library(ggplot2)
library(knitr)
library(plyr)
library(tidyverse)
library(cowplot)
library(magrittr)
library(dplyr)
library(readxl)
library(parallel)

library(osDesign)
library(gplots)
library(rockchalk)
library(varhandle)
library(lattice)
library(forestplot)
library(survival)
library(fmsb)
library(InformationValue)
library(caret)
library(verification)
library(ROCR)
library(pROC)
library(extrafont)
library(huxtable)
library(Seurat)
library(ggvenn)
library(pheatmap)
library(ggplotify)

library(BiocGenerics)
library(Biobase)

load("data/RV144_GES_Analysis.RData")
flrstatus <- ifelse(mydata_rv144_vac_enriched63_170$infect=="Yes",1,0)
sex <-ifelse(mydata_rv144_vac_enriched63_170$dem_sex=="Female",1,0)
risk.cat <-ifelse(mydata_rv144_vac_enriched63_170$BRA_risk=="Low",0,ifelse(mydata_rv144_vac_enriched63_170$BRA_risk=="Medium",1,2)) 
risk.medium <- ifelse(mydata_rv144_vac_enriched63_170$BRA_risk=="Medium",1,0)
risk.high <- ifelse(mydata_rv144_vac_enriched63_170$BRA_risk=="High",1,0)
EIGEN1 <-  mydata_rv144_vac_enriched63_170$EIGEN1
# making GES using scaled gene expression data (the enriched 63 genes)
trans63= rowMeans(mydata_rv144_vac_enriched63_170[,64:126])

####################### two phase design ##############################

nn0 <- c(length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="Yes" & dat1$vaccno==4]),
         length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Male" & dat1$perprot=="Yes" & dat1$vaccno==4]),
         length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno==4]),
         length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Male" & dat1$perprot=="No" & dat1$vaccno==4]),
         length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno<4]))

nn1 <- c(length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="Yes" & dat1$vaccno==4]),
         length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Male" & dat1$perprot=="Yes" & dat1$vaccno==4]),
         length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno==4]),
         length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Male" & dat1$perprot=="No" & dat1$vaccno==4]),
         length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno<4]))


stratuminds <- ifelse(mydata_rv144_vac_enriched63_170$dem_sex=="Female" & mydata_rv144_vac_enriched63_170$perprot=="Yes" & mydata_rv144_vac_enriched63_170$vaccno==4,1,
                      ifelse(mydata_rv144_vac_enriched63_170$dem_sex=="Male" & mydata_rv144_vac_enriched63_170$perprot=="Yes" & mydata_rv144_vac_enriched63_170$vaccno==4,2,
                             ifelse(mydata_rv144_vac_enriched63_170$dem_sex=="Female" & mydata_rv144_vac_enriched63_170$perprot=="No" & mydata_rv144_vac_enriched63_170$vaccno==4,3,
                                    ifelse(mydata_rv144_vac_enriched63_170$dem_sex=="Male" & mydata_rv144_vac_enriched63_170$perprot=="No" & mydata_rv144_vac_enriched63_170$vaccno==4,4,5))))
scaled_trans63=scale(trans63)
fit.tps <- tps(flrstatus ~ scaled_trans63 + EIGEN1  + sex+ risk.medium+ risk.high,nn0=nn0, nn1=nn1,group=stratuminds,method="ML",cohort=TRUE)

x <- as.matrix(cbind(round(exp(fit.tps$coef[2]),2),round(exp(fit.tps$coef[2] - sqrt(fit.tps$covm[2,2])*1.96),2),
                     round(exp(fit.tps$coef[2] + sqrt(fit.tps$covm[2,2])*1.96),2),round(min(2*(1-pnorm(abs(fit.tps$coef[2]/sqrt(fit.tps$covm[2,2])))),1.0)
                                                                                        ,4)))
colnames(x)=c("OR","95%CI LB","95%CI UB", "p value")
rownames(x)="GES"
x

myforest=matrix(NA, 3, 4)
rownames(myforest)=c("GES","gp70-V1V2 (IgG)", "IgA (Env)")

myforest=as.data.frame(myforest)
colnames(myforest)=c("OR","95%CI LB","95%CI UB","p value")
myforest[1,]=c(0.37,0.22,0.63,"0.0002")
myforest[2,]=c(0.7,0.49,1.02,0.06)
myforest[3,]=c(1.39,1.00,1.93,0.05)


tabletext <- cbind(c("Immune Response",rownames(myforest)),
                   
                   c("p value",myforest$`p value`))


forestplot(labeltext=tabletext, graph.pos=2, 
           mean=c(NA,myforest$OR), 
           lower=c(NA,myforest$`95%CI LB`), upper=c(NA,myforest$`95%CI UB`),
        
           txt_gp=fpTxtGp(label=gpar(cex=0.7),
                          ticks=gpar(cex=0.5),
                          xlab=gpar(cex = 0.7),
                          title=gpar(cex = 1.0)),
           col=fpColors(box="black", lines="black", zero = "gray60"),
           zero=1,  lineheight = "auto", boxsize=0.11, colgap=unit(2.5,"mm"),
           lwd.ci=2, ci.vertices=TRUE, ci.vertices.height = 0.05,xticks = c(0,0.25,0.5,0.75,1,1.25, 1.5, 1.75, 2.00), xlab="\n Odds Ratio")

GES is associated with lower odds of HIV acquisition compared to the other two primary correlates of risk. Variables were measured at week 26, 2 weeks post last vaccination. For each variable, the OR is reported per 1-SD increase. Transcriptome data was available only in a subset of the 246 donors.

############## Cumulative Incidence Curve #####################
########### GES63 low med high #########

alpha1.0 <- length(unique(mydata_rv144_vac_enriched63_170$pin[mydata_rv144_vac_enriched63_170$trt=="VACCINE" & mydata_rv144_vac_enriched63_170$infect=="No" & mydata_rv144_vac_enriched63_170$dem_sex=="Female" & mydata_rv144_vac_enriched63_170$perprot=="Yes" & mydata_rv144_vac_enriched63_170$vaccno==4]))/
  length(dat1$pin[dat1$trt=="VACCINE" & dat1$infect=="No" & dat1$cc_cohort==1 & dat1$dem_sex=="Female" & dat1$perprot=="Yes" & dat1$vaccno==4])

alpha2.0 <- length(unique(mydata_rv144_vac_enriched63_170$pin[mydata_rv144_vac_enriched63_170$trt=="VACCINE" & mydata_rv144_vac_enriched63_170$infect=="No" & mydata_rv144_vac_enriched63_170$dem_sex=="Male" & mydata_rv144_vac_enriched63_170$perprot=="Yes" & mydata_rv144_vac_enriched63_170$vaccno==4]))/
  length(dat1$pin[dat1$trt=="VACCINE" & dat1$infect=="No" & dat1$cc_cohort==1 & dat1$dem_sex=="Male" & dat1$perprot=="Yes" & dat1$vaccno==4])

alpha3.0 <- length(unique(mydata_rv144_vac_enriched63_170$pin[mydata_rv144_vac_enriched63_170$trt=="VACCINE" & mydata_rv144_vac_enriched63_170$infect=="No" & mydata_rv144_vac_enriched63_170$dem_sex=="Female" & mydata_rv144_vac_enriched63_170$perprot=="No" & mydata_rv144_vac_enriched63_170$vaccno==4]))/
  length(dat1$pin[dat1$trt=="VACCINE" & dat1$infect=="No" & dat1$cc_cohort==1 & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno==4])

alpha4.0 <- length(unique(mydata_rv144_vac_enriched63_170$pin[mydata_rv144_vac_enriched63_170$trt=="VACCINE" & mydata_rv144_vac_enriched63_170$infect=="No" & mydata_rv144_vac_enriched63_170$dem_sex=="Male" & mydata_rv144_vac_enriched63_170$perprot=="No" & mydata_rv144_vac_enriched63_170$vaccno==4]))/
  length(dat1$pin[dat1$trt=="VACCINE" & dat1$infect=="No" & dat1$cc_cohort==1 & dat1$dem_sex=="Male" & dat1$perprot=="No" & dat1$vaccno==4])

alpha5.0 <- length(unique(mydata_rv144_vac_enriched63_170$pin[mydata_rv144_vac_enriched63_170$trt=="VACCINE" & mydata_rv144_vac_enriched63_170$infect=="No" & mydata_rv144_vac_enriched63_170$dem_sex=="Female" & mydata_rv144_vac_enriched63_170$perprot=="No" & mydata_rv144_vac_enriched63_170$vaccno<4]))/
  length(dat1$pin[dat1$trt=="VACCINE" & dat1$infect=="No" & dat1$cc_cohort==1 & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno<4])



cat63_3.vp <- as.factor(dat.vcc.allplc_3$cat63_3.vp)
flrtime.vp <- dat.vcc.allplc_3$time_to_infect - dat.vcc.allplc_3$time_to_wk26
flrstatus.vp <- ifelse(dat.vcc.allplc_3$infect=="Yes",1,0)
cat63.vp <- dat.vcc.allplc_3$cat63.vp
IGG.vp <- dat.vcc.allplc_3$IGG.vp
IGG_DPB13.vp <- dat.vcc.allplc_3$IGG_DPB13.vp
DPB13.vp <- dat.vcc.allplc_3$DPB13.vp


wt <- ifelse(flrstatus.vp ==1 |dat.vcc.allplc_3$trt=="PLACEBO",1,
             ifelse(dat.vcc.allplc_3$trt=="VACCINE" & flrstatus.vp==0 & dat.vcc.allplc_3$dem_sex=="Female" & dat.vcc.allplc_3$perprot=="Yes" & 
                      dat.vcc.allplc_3$vaccno==4,1/alpha1.0,
                    ifelse(dat.vcc.allplc_3$trt=="VACCINE" & flrstatus.vp==0 & dat.vcc.allplc_3$dem_sex=="Male" & dat.vcc.allplc_3$perprot=="Yes" & 
                             dat.vcc.allplc_3$vaccno==4,1/alpha2.0,
                           ifelse(dat.vcc.allplc_3$trt=="VACCINE" & flrstatus.vp==0 & dat.vcc.allplc_3$dem_sex=="Female" & dat.vcc.allplc_3$perprot=="No" & 
                                    dat.vcc.allplc_3$vaccno==4,1/alpha3.0,
                                  ifelse(dat.vcc.allplc_3$trt=="VACCINE" & flrstatus.vp==0 & dat.vcc.allplc_3$dem_sex=="Male" & dat.vcc.allplc_3$perprot=="No" & 
                                           dat.vcc.allplc_3$vaccno==4,1/alpha4.0,1/alpha5.0)))))


###################################################
# KM plots by GES (low vs. med vs. high)  
output.legend <- NULL
scl <- 365.25/12
new <- cat63_3.vp
levels(new)=c("0","3","1","2")
new = factor(new,levels(new)[c(1,3,4,2)])
V1=new


par(mfrow=c(1,1),mar=c(5,4,2,3),las=1,cex.axis=1.1,cex.lab=1.1,cex.main=1.1,oma=c(4,2,0,0))
KM <- survfit(Surv(flrtime.vp/scl, flrstatus.vp)~V1, type="kaplan-meier",weights=wt)
sKM <- summary(KM,times=c(0,12,24,36,48),extend=T)
n.events.total <- tapply(sKM$n.event,rep(1:5,each=4),sum)
n.event <- matrix(round(sKM$n.event),nrow=4,byrow=T)
n.atrisk <- matrix(round(sKM$n.risk),nrow=4,byrow=T)
ptime <- unique(sKM$time)
output.legend <- data.frame(Variable=rep("GES63",8),Type=rep(c("n.atrisk","n.event"),each=4),
                            Category=rep(c("Placebo","Low GES","Medium GES","High GES"),2),rbind(n.atrisk,n.event))
plot(KM,fun="event",col=c("black","red","blue","purple4"),lty=c(1,1,1,1),xlab="Months since the week 26 visit",ylab="",lwd=4.7,axes=F,xlim=c(0,40),mark.time=TRUE) 
#str(V1)
legend(0, .0075, c("Placebo", "Low GES","Medium GES","High GES"), lty=c(1,1,1,1),lwd=c(2,2,2),col=c("black","red","blue","purple4"),bty="n",cex=0.8,y.intersp=0.8)
box(bty="l")
axis(1,at=seq(0,40,by=12),label=c("0","12","24","36"))
axis(2,at=seq(0,0.01,by=0.002),label=c( "0.000", "0.002", "0.004", "0.006", "0.008"," 0.010"))
mtext(side=1,at=-10,text="Number at risk:",line=4,cex=1.0,adj=0)
mtext(side=1,at=-10,text="Placebo",line=5,cex=1.0,adj=0)
mtext(side=1,at=-10,text="Low GES",line=6,cex=1.0,adj=0)
mtext(side=1,at=-10,text="Medium GES",line=7,cex=1.0,adj=0)
mtext(side=1,at=-10,text="High GES",line=8,cex=1.0,adj=0)

mtext(side=1,text=n.atrisk[1,],line=5,at=ptime,cex=1.0)
mtext(side=1,text=n.atrisk[2,],line=6,at=ptime,cex=1.0)
mtext(side=1,text=n.atrisk[3,],line=7,at=ptime,cex=1.0)
mtext(side=1,text=n.atrisk[4,],line=8,at=ptime,cex=1.0)

mtext(side=1,text=paste("(",n.event[1,],")"),line=5,at=ptime+3,cex=0.9)
mtext(side=1,text=paste("(",n.event[2,],")"),line=6,at=ptime+3,cex=0.9)
mtext(side=1,text=paste("(",n.event[3,],")"),line=7,at=ptime+3,cex=0.9)
mtext(side=1,text=paste("(",n.event[4,],")"),line=8,at=ptime+3,cex=0.9)

mtext(side=2,"Probability of acquiring HIV-1 infection",cex=1.1,outer=T,las=3,line=1)

Probability of acquiring HIV-1 is lower in individuals with higher GES.

flrtime.vp <- dat.vcc.allplc_3$time_to_infect - dat.vcc.allplc_3$time_to_wk26
flrstatus.vp <- ifelse(dat.vcc.allplc_3$infect=="Yes",1,0)
sex.vp <- ifelse(dat.vcc.allplc_3$dem_sex=="Female",1,0)
risk.cat.vp <- ifelse(dat.vcc.allplc_3$BRA_risk=="Low",0,ifelse(dat.vcc.allplc_3$BRA_risk=="Medium",1,2)) 
risk.medium.vp <- ifelse(dat.vcc.allplc_3$BRA_risk=="Medium",1,0)
risk.high.vp <- ifelse(dat.vcc.allplc_3$BRA_risk=="High",1,0)

############# two-phase design  ################
nn0.vp <- c(length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="Yes" & dat1$vaccno==4]),
            length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Male" & dat1$perprot=="Yes" & dat1$vaccno==4]),
            length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno==4]),
            length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Male" & dat1$perprot=="No" & dat1$vaccno==4]),
            length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno<4]),
            length(dat1$pin[dat1$infect=="No" & dat1$cc_cohort==1 & dat1$trt=="PLACEBO" & (dat1$time_to_infect > dat1$time_to_wk26)])
)




nn1.vp <- c(length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="Yes" & dat1$vaccno==4]),
            length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Male" & dat1$perprot=="Yes" & dat1$vaccno==4]),
            length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno==4]),
            length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Male" & dat1$perprot=="No" & dat1$vaccno==4]),
            length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="VACCINE" & dat1$dem_sex=="Female" & dat1$perprot=="No" & dat1$vaccno<4]),
            length(dat1$pin[dat1$infect=="Yes" & dat1$cc_cohort==1 & dat1$trt=="PLACEBO" & (dat1$time_to_infect > dat1$time_to_wk26)]))



stratuminds.vp <- ifelse(dat.vcc.allplc_3$trt=="VACCINE" & dat.vcc.allplc_3$dem_sex=="Female" & dat.vcc.allplc_3$perprot=="Yes" & dat.vcc.allplc_3$vaccno==4,1,
                         ifelse(dat.vcc.allplc_3$trt=="VACCINE" & dat.vcc.allplc_3$dem_sex=="Male" & dat.vcc.allplc_3$perprot=="Yes" & dat.vcc.allplc_3$vaccno==4,2,
                                ifelse(dat.vcc.allplc_3$trt=="VACCINE" & dat.vcc.allplc_3$dem_sex=="Female" & dat.vcc.allplc_3$perprot=="No" & dat.vcc.allplc_3$vaccno==4,3,
                                       ifelse(dat.vcc.allplc_3$trt=="VACCINE" & dat.vcc.allplc_3$dem_sex=="Male" & dat.vcc.allplc_3$perprot=="No" & dat.vcc.allplc_3$vaccno==4,4,
                                              ifelse(dat.vcc.allplc_3$trt=="VACCINE" & dat.vcc.allplc_3$dem_sex=="Female" & dat.vcc.allplc_3$perprot=="No" & dat.vcc.allplc_3$vaccno<4,5,6)))))


fit.tps.vp.cat63 <- tps(flrstatus.vp ~ factor(new) + sex.vp + risk.medium.vp + risk.high.vp,nn0=nn0.vp, 
                        nn1=nn1.vp,group=stratuminds.vp,method="ML",cohort=TRUE, data=dat.vcc.allplc_3)

out1 <- fit.tps.vp.cat63$coef
out2 <- diag(fit.tps.vp.cat63$covm)


I = 2
q =  t(fit.tps.vp.cat63$coef[I])%*%solve(fit.tps.vp.cat63$covm[c(I),c(I)])%*%fit.tps.vp.cat63$coef[I] 
p.one = 1 - pchisq(q, 1) 
I = 3
q =  t(fit.tps.vp.cat63$coef[I])%*%solve(fit.tps.vp.cat63$covm[c(I),c(I)])%*%fit.tps.vp.cat63$coef[I] 
p.two = 1 - pchisq(q, 1) 

I = 4
q =  t(fit.tps.vp.cat63$coef[I])%*%solve(fit.tps.vp.cat63$covm[c(I),c(I)])%*%fit.tps.vp.cat63$coef[I] 
p.three = 1 - pchisq(q, 1) 
par(mar=c(3.5, 5, 2.1, 2.1), bty="l")

plotCI(x=1-exp(out1[2:4]),ui= 1-exp(out1[2:4] - 1.96*sqrt(out2[2:4])), li= 1-exp(out1[2:4] + 1.96*sqrt(out2[2:4])), 
       col=c("red","blue","purple4"), barcol=c("red","blue","purple4"), lwd=3, xaxt="n", yaxt="n", xlab="", ylab="",
       xlim=c(0.25, 3.55), ylim=c(-2,1), cex.lab=1.4, cex.main=1.4)


axis(2, at=seq(-2,1,by=0.5), labels=paste(seq(-200,100,by=50), "%", sep=""), lwd.ticks=2, tcl=-0.6, las=2, cex.axis=1.1)
axis(2, at=seq(-2,1,by=0.1), labels=NA, cex.axis=1.1, tcl=-0.4)
axis(side=1, at=1:3, labels=c("Low","Medium","High"), cex.axis=1.1)

abline(h=0,lty=2, col=2) 
title(ylab="Estimated VE", cex.lab=1.2, line=4)
title(xlab="GES", cex.lab=1.2, line=2.2)

pvals = sapply(c(p.one, p.two,p.three), sprintf, fmt="p = %0.2f")
text(x=1:3, y=rep(0.98,1), pvals)
text(x=c(1.3,2.3,3.3), y=1-exp(out1[2:4]), paste(round(100*(1-exp(out1[2:4])),2),"%",sep=""))
text(x=c(1.3,2.3,3.3), y=1-exp(out1[2:4] - 1.96*sqrt(out2[2:4])), paste(round(100*(1-exp(out1[2:4] - 1.96*sqrt(out2[2:4]))),2),"%",sep=""))
text(x=c(1.3,2.3,3.3), y=1-exp(out1[2:4] + 1.96*sqrt(out2[2:4])), paste(round(100*(1-exp(out1[2:4] + 1.96*sqrt(out2[2:4]))),2),"%",sep=""))

Vaccine efficacy is increased significantly in individuals with high GES.

######## repeat data partition 1000: caculate accuracy & auc #######

auc <- NULL
p_auc <- NULL
k <- 1000
acc <- NULL
cutp <- NULL
pbar <- create_progress_bar('text')
pbar$init(k)

set.seed(123)

for(i in 1:k)
{
  # Train-test splitting
  # 70% of samples -> training
  # 30% of samples -> testing
  trainIndex <- createDataPartition(mydata_rv144_vac_enriched63_170$infect, p = .7, 
                                    list = FALSE)
  train <- mydata_rv144_vac_enriched63_170[trainIndex,]
  test <- mydata_rv144_vac_enriched63_170[-trainIndex,]
  
  flrstatus <- ifelse(test$infect=="Yes",1,0)
  sex <-ifelse(test$dem_sex=="Female",1,0)
  risk.cat <-ifelse(test$BRA_risk=="Low",0,ifelse(test$BRA_risk=="Medium",1,2)) 
  risk.medium <- ifelse(test$BRA_risk=="Medium",1,0)
  risk.high <- ifelse(test$BRA_risk=="High",1,0)
  EIGEN1 <-  test$EIGEN1
  trans63=test$trans_63
  test_data=as.data.frame(cbind(flrstatus,sex,risk.medium,risk.high,EIGEN1,trans63))
  rownames(test_data)=rownames(test)
  
  flrstatus <- ifelse(train$infect=="Yes",1,0)
  sex <-ifelse(train$dem_sex=="Female",1,0)
  risk.cat <-ifelse(train$BRA_risk=="Low",0,ifelse(train$BRA_risk=="Medium",1,2)) 
  risk.medium <- ifelse(train$BRA_risk=="Medium",1,0)
  risk.high <- ifelse(train$BRA_risk=="High",1,0)
  EIGEN1 <-  train$EIGEN1
  trans63=train$trans_63
  train_data=as.data.frame(cbind(flrstatus,sex,risk.medium,risk.high,EIGEN1,trans63))
  rownames(train_data)=rownames(train)
  
  model <- glm(flrstatus~trans63+ EIGEN1 + sex + risk.medium + risk.high,family = binomial(link = "logit"),data = train_data)
  
  cutp <-   optimalCutoff(actuals = train_data$flrstatus, predictedScores = predict(model, train_data, type="response"),  optimiseFor = "misclasserror") 

    
  
  results_prob <- predict(model, test_data, type="response")
  
  results <- ifelse(results_prob > cutp,1,0)
  
  answers <- test_data$flrstatus
  
  misClasificError <- mean(answers != results)
  
  acc[i] <- 1-misClasificError
  auc[i]=roc.area(answers,results_prob)$A
 
  pbar$step()
}

hist(auc,
     col="green4",
     border="green4",
     density=70,
     probability = TRUE,
     xlim = c(0.3,1),
     ylim = c(0,15),
     xlab = "",
     breaks = 30,
     main = "")
par(new=TRUE)
plot(density(acc, bw = 0.02), xlim = c(0.3,1),
     ylim = c(0,15),main="",xlab="AUC & Accuracy",col="orange",lwd=2,bty="n")
legend(x=0.9,y=15, "AUC", fill="green4",density=70, border="green4",bty="n",cex = 0.8)
legend(x=0.9,y=14, "Accuracy", col="orange",lty = 1 ,lwd=2, seg.len=1, bty="n",cex = 0.8)

Distribution of AUC and accuracy plotted after repeating the process 1000 times showed that GES could predict HIV-1 infection with AUC of 0.67±0.08 and with accuracy of 0.81±0.04. GES, gene expression score.

Association of the GES with HIV-1 breakthrough infections in a human vaccine trial.

GES was significantly higher in the placebo (blue) compared to vaccine recipients (red) among the RV144 participants with single-founder breakthrough infection (p<0.05). Numbers indicate the number of participants plotted. Asterisks indicate significant pairwise differences by the Mann-Whitney test. GES, gene expression score.

Gene signature associates with an antibody effector function in a human vaccine trial

Immune responses correlating with this signature can provide additional insights into mechanisms that could be harnessed to improve vaccine design. We previously showed in the NHP studies that the protective gene signature that was enriched in uninfected monkeys after Ad26/gp140 vaccination was also associated with higher magnitude of ADCP 16Ehrenberg et al.2019. In the RV144 human trial a number of immunological parameters were previously measured as part of the immune-correlates analysis, but not ADCP. The RV306 immunogenicity trial that employed a similar prime-boost RV144 vaccine regimen with additional late boosts provided us with a unique opportunity to test if the gene signature was associated with ADCP 40Pitisuttithum et al.2020. We generated transcriptome-wide gene expression data from peripheral blood 2 weeks after the RV144 vaccine regimen (prior to the additional boosts) and assessed for enrichment of the gene signature with the magnitude of ADCP measured at the same time point in 24 participants. The gene signature with 118 enriched genes was significantly associated with higher magnitude of ADCP (NES=3.0, p<0.001) (Figure 3A, Supplementary file 1b). Using the same geneset, 93 genes were found to be enriched in a subset of overlapping participants (N=21), where samples were collected 3 days after the RV144 immunizations (NES=2.5, p<0.001) (Figure 3A, Supplementary file 1b). The model built using ADCP GES from day 3 was able to predict ADCP responses measured 2 weeks after the last vaccination with an accuracy of 0.71. The receiver operator characteristic (ROC) curve illustrates the discriminating ability of the classifier from the day 3 training data set (AUC=0.8, 95% confidence interval [CI]: 0.6–0.99, p=0.01) and the week 2 testing data set (AUC=0.73, 95% CI: 0.5–0.95, p=0.03) to predict ADCP responses (Figure 3B). To evaluate these findings in the context of an efficacy trial, a GES from the list of enriched genes associating with ADCP from both time points was computed in the RV144 study. ADCP GES from both time points correlated strongly with the protective RV144 GES (Rho=0.74, p=2.2e−16, Rho=0.75, p=2.2e−16) (Figure 3C–D). Given the strong correlation in RV144 for the enriched genes from both time points with infection status, we investigated the overlapping 82 genes from day 3 and week 2 time points in a prediction analysis (Figure 4A–B). In addition to being able to successfully predict ADCP magnitude, the genes also show a very clear distinction between the high versus low ADCP groups at both time points (Figure 4C, Supplementary file 1b). To gain understanding of the potential role of the 82 genes, we used GeneMANIA to explore the gene function of the 82 overlapping genes based on physical interaction, co-expression, co-localization, and shared pathways. There were 41 genes that belonged to specific top pathways including the immune system, innate immune system, signaling by interleukins, hallmark inflammatory response, hallmark TNFA signaling via NFKB, cell-cell communication, interleukin-10 signaling, signal regulatory protein family interactions, and pentose phosphate pathway (Figure 4D, Supplementary file 1c). A focused search for gene ontology (GO) terms identified four genes with phagocytosis pathway membership (TYROBP, SIRPA, SIRPB1, and RAB20) (Supplementary file 1d).

# Reading in GSEA result folders
v5Bwk26ADCP <- read.table(file = "data/v5B_ADCP_wk26_method_2_GT.Gsea.1592331997827/GSE29618_BCELL_VS_MONOCYTE_DAY7_FLU_VACCINE_DN.xls", sep = "\t", header = T)
v6wk26ADCP <- read.table(file = "data/v6_ADCP_wk26_method_2_GT.Gsea.1592331997828/GSE29618_BCELL_VS_MONOCYTE_DAY7_FLU_VACCINE_DN.xls", sep = "\t", header = T)

# Reading in GSEA run statistics
gsea_results <- read_xlsx("data/Updated_core_enrichment.xlsx", sheet = 2)

# Setting plot labels

x_lab = paste("\U2190","ADCP High        Rank in Ordered Dataset       ADCP Low", "\U2192")
y_lab = "Enrichment Score (ES)"


### Updated Enrichment Plots (combined Day 3 + Week 2)

# Creating legends for Day 3 (v5B)

nes_v5Bwk26ADCP = signif(gsea_results %>% 
                           filter(analysis == "v5B_ADCP_wk26_GT") %>%
                           dplyr::select(NES), digits=2)

pval_v5Bwk26ADCP = ifelse(round(gsea_results %>% 
                                  filter(analysis == "v5B_ADCP_wk26_GT") %>%
                                  dplyr::select(NOM.p.val), digits=3) == 0, 0.001,
                          round(gsea_results %>% 
                                  filter(analysis == "v5B_ADCP_wk26_GT") %>% 
                                  dplyr::select(NOM.p.val), digits=3))

name_v5Bwk26ADCP = "Day 3"
legend_v5Bwk26ADCP = paste(name_v5Bwk26ADCP, " NES = ", nes_v5Bwk26ADCP, ", p < ", pval_v5Bwk26ADCP, sep="")
max_v5Bwk26ADCP<- max(v5Bwk26ADCP$RANK.IN.GENE.LIST)


# Creating legends for Week 2 (v6)

nes_v6wk26ADCP = signif(gsea_results %>% 
                          filter(analysis == "v6_ADCP_wk26_GT") %>% 
                          dplyr::select(NES), digits=2)

pval_v6wk26ADCP = ifelse(round(gsea_results %>% 
                                 filter(analysis == "v6_ADCP_wk26_GT") %>% 
                                 dplyr::select(NOM.p.val), digits=3) == 0, 0.001,
                         round(gsea_results %>% 
                                 filter(analysis == "v6_ADCP_wk26_GT") %>% 
                                 dplyr::select(NOM.p.val), digits=3))

name_v6wk26ADCP = "Week 2"
legend_v6wk26ADCP = paste(name_v6wk26ADCP, " NES = ", nes_v6wk26ADCP, ", p < ", pval_v6wk26ADCP, sep="")
max_v6wk26ADCP<- max(v6wk26ADCP$RANK.IN.GENE.LIST)


# Plotting ES v/s Rank

plotmax <- ifelse(max_v5Bwk26ADCP > max_v6wk26ADCP, max_v5Bwk26ADCP, max_v6wk26ADCP)

tplot1 <- ggplot(data = v6wk26ADCP, aes(x = RANK.IN.GENE.LIST, y = RUNNING.ES)) +  
  geom_line(aes(color ="#2844c9"), size=1) +  
  geom_line(data = v5Bwk26ADCP, mapping = aes(x = RANK.IN.GENE.LIST, y = RUNNING.ES, color="red"), size=1) + 
  labs(x = x_lab, y = y_lab, color = "Legend\n") + 
  scale_color_manual(labels=c(legend_v6wk26ADCP, legend_v5Bwk26ADCP), values=c("#2844c9", "red")) + 
  theme_bw()

tplot3 <- tplot1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
                         panel.background = element_blank()) + 
  theme(axis.title.y = element_text(colour="black",size=16, family="Arial"), 
        axis.title.x = element_text(colour="black",size=16, family="Arial"), 
        axis.line = element_line(color="black"), 
        axis.text.x=element_text(color="black",size="14",family="Arial"), 
        axis.text.y=element_text(color="black",size="14", family="Arial"),
        panel.border = element_blank(),
        panel.background = element_blank()) + 
  geom_hline(yintercept=0,color="black") + 
  theme(legend.position ="bottom", legend.title=element_blank(), 
        plot.margin = unit(c(1,1,1,1), "cm"), 
        legend.text=element_text(color="black",size="14", family="Arial")) + 
  xlim(0, plotmax) + ylim(-0.3, 0.7) + 
  theme(legend.justification = c(1, 1), legend.position = c(1, 1), legend.box.margin=margin(rep(2, 4)))

tplot3

Strong relationship between functional ADCP responses in a human vaccine trial and the protective RV144 signature.

The geneset that associated with protection in an efficacy study was also enriched with higher magnitude of ADCP measured 2 weeks after vaccination in an immunogenicity trial that employed the RV144 vaccine regimen. NES from RNA-seq data at time points (A) 2 weeks (118 enriched genes) (N=24) and 3 days (93 enriched genes) (N=21) post the RV144 vaccine regimen in the RV306 trial are indicated.

load("data/RV306_ADCP_GES.RData")
mydata_rv306_v5b_ADCP_wk26$y = ifelse(mydata_rv306_v5b_ADCP_wk26$adcpGT_wk26_vac_cat_updated == "Low",1,0)
mydata_rv306_v6_ADCP_wk26$y = ifelse(mydata_rv306_v6_ADCP_wk26$adcpGT_wk26_vac_cat_updated == "Low",1,0)

mylogit=glm(y ~trans_93, data=mydata_rv306_v5b_ADCP_wk26,family = binomial(link = "logit"))
prob=predict(mylogit,type=c("response"),newdata = mydata_rv306_v5b_ADCP_wk26)
pred=prediction(prob,mydata_rv306_v5b_ADCP_wk26$y)
perf=performance(pred,"tpr","fpr")
par(mar=c(5.1, 5.1, 5.1, 5.1), bty = 'l',family = "sans")
plot(perf, col="red", xlab="1-Specificity", 
     ylab="Sensitivity",font.axis=2,lwd=2,main =" " )
abline(0,1,lwd=2)
auc=performance(pred,"auc")
auc=round(unlist(slot(auc,"y.values")),2) 
y_obs=mydata_rv306_v5b_ADCP_wk26$y
probpred=predict(mylogit, mydata_rv306_v5b_ADCP_wk26, type="response")
p=roc.area(y_obs,probpred)$p.value
ci=ci.auc(mydata_rv306_v5b_ADCP_wk26$y,mydata_rv306_v5b_ADCP_wk26$trans_93)

legend(0.5, 0.4, c(paste("Training (Day 3)"), paste("AUC = ",round(auc,2))), cex = 0.9,text.col = "red", bty = "n")
### v6 #####
prob <- predict(mylogit, mydata_rv306_v6_ADCP_wk26, type="response")
pred=prediction(prob,mydata_rv306_v6_ADCP_wk26$y)
perf=performance(pred,"tpr","fpr")
par(new=TRUE)
plot(perf, col="blue", xlab="", 
     ylab="",lwd=2)

auc=performance(pred,"auc")
auc=round(unlist(slot(auc,"y.values")),3) 
y_obs=mydata_rv306_v6_ADCP_wk26$y
probpred=predict(mylogit, mydata_rv306_v6_ADCP_wk26, type="response")
p=roc.area(y_obs,probpred)$p.value 
ci=ci.auc(mydata_rv306_v6_ADCP_wk26$y,mydata_rv306_v6_ADCP_wk26$trans_93)
legend(0.5, 0.2, c(paste("Test (Week 2)"), paste("AUC = ",round(auc,2))), cex = 0.9,text.col = "blue", bty = "n")

The model built using ADCP GES from day 3 was able to predict ADCP responses measured 2 weeks after the last vaccination with an accuracy of 0.71. The ROC curve illustrates the discriminating ability of the classifier from the day 3 training data set (AUC=0.8, 95% CI: 0.6–0.99, p=0.01) and the week 2 testing data set (AUC=0.73, 95% CI: 0.5–0.95, p=0.03) to predict ADCP responses.

load("data/mydata_rv144_GES_cor.RData")

trans63=mydata_rv144_GES_cor$trans_63
trans91=mydata_rv144_GES_cor$trans_91
trans115=mydata_rv144_GES_cor$trans_115

##### CORRELATION & PLOT ##########
### TRANS91 ####


myplot=as.data.frame(cbind(trans63,trans91))
p91 <- ggplot(myplot, aes(x = trans63, y = trans91)) +
  geom_point()+
  xlab("GES Infection") +
  ylab("GES ADCP Day 3") +
  theme_bw() + 
  theme(text=element_text(family="Arial"),
        axis.text = element_text(color = "black", size = 12),
        axis.title = element_text(size = 14),
        panel.border = element_blank(), panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) +
  
  stat_smooth(formula = y ~ x, method = "lm",
              col = "#C42126",
              se = FALSE,
              size = 1)

### TRANS115 ####
myplot=as.data.frame(cbind(trans63,trans115))
p115 <- ggplot(myplot, aes(x = trans63, y = trans115)) +
  geom_point()+
  xlab("GES Infection") +
  ylab("GES ADCP Week 2") +
  theme_bw() + 
  theme(text=element_text(family="Arial"),
        axis.text = element_text(color = "black", size = 12),
        axis.title = element_text(size = 14),
        panel.border = element_blank(), panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) +
  
  stat_smooth(formula = y ~ x,method = "lm",
              col = "#C42126",
              se = FALSE,
              size = 1)

p115 + annotate("text", x = 0.7, y = 1.4, label = "Rho = 0.74   p = 2.2e-16",col = "black",family = "Arial", size = 4.5)

GES computed from the enriched genes associating with ADCP correlated strongly with the protective GES in the RV144 study (N=170) at time points 2 weeks (115 enriched genes) post the RV144 vaccine regimen. ADCP, antibody-dependent cellular phagocytosis; CI, confidence interval; GES, gene expression score; NES, normalized enrichment score.


p91 + annotate("text", x = 0.7, y = 1.3, label = "Rho = 0.75   p = 2.2e-16",col="black",family="Arial", size = 4.5)

GES computed from the enriched genes associating with ADCP correlated strongly with the protective GES in the RV144 study (N=170) at time points 3 days (91 enriched genes) post the RV144 vaccine regimen. ADCP, antibody-dependent cellular phagocytosis; CI, confidence interval; GES, gene expression score; NES, normalized enrichment score.

enriched_genes = read_xlsx("data/Gene_lists.xlsx", sheet = 1)

names(enriched_genes)[c(5,6)] = c("Day 3", "Week 2")
enriched_gene_lists = as.list(enriched_genes)

# Removing NA from Day 3 list, caused due to difference in number of row
enriched_gene_lists$`Day 3` = na.omit(enriched_gene_lists$`Day 3`)

# Plotting a Venn diagram with required colors
ggvenn(enriched_gene_lists, c("Day 3","Week 2"), show_percentage=FALSE, 
       text_size = 6, fill_color = c("red",rgb(10,100,200,max=255)), fill_alpha = 0.85)

Overlapping enriched genes associating with ADCP responses.

There were 82 overlapping genes between the day 3 (N=21) and week 2 (N=24) ADCP enriched genes in the RV306 study.

load("data/RV306_ADCP_GES_overlap.RData")
mydata_rv306_v5b_ADCP_wk26_overlap82$y = ifelse(mydata_rv306_v5b_ADCP_wk26_overlap82$adcpGT_wk26_vac_cat_updated == "Low",1,0)
mydata_rv306_v6_ADCP_wk26_overlap82$y = ifelse(mydata_rv306_v6_ADCP_wk26_overlap82$adcpGT_wk26_vac_cat_updated == "Low",1,0)

mylogit=glm(y ~trans_82, data=mydata_rv306_v5b_ADCP_wk26_overlap82,family = binomial(link = "logit"))
prob=predict(mylogit,type=c("response"),newdata = mydata_rv306_v5b_ADCP_wk26_overlap82)
pred=prediction(prob,mydata_rv306_v5b_ADCP_wk26_overlap82$y)
perf=performance(pred,"tpr","fpr")
par(mar=c(5.1, 5.1, 5.1, 5.1), bty = 'l',family = "sans")
plot(perf, col="red", xlab="1-Specificity", 
     ylab="Sensitivity",font.axis=2,lwd=2,main =" " )
abline(0,1,lwd=2)
auc=performance(pred,"auc")
auc=round(unlist(slot(auc,"y.values")),2) 
y_obs=mydata_rv306_v5b_ADCP_wk26_overlap82$y
probpred=predict(mylogit, mydata_rv306_v5b_ADCP_wk26_overlap82, type="response")
p=roc.area(y_obs,probpred)$p.value
ci=ci.auc(mydata_rv306_v5b_ADCP_wk26_overlap82$y,mydata_rv306_v5b_ADCP_wk26_overlap82$trans_82)

legend(0.5, 0.4, c(paste("Training (Day 3)"), paste("AUC = ",round(auc,2))), cex = 0.9,text.col = "red", bty = "n")
### v6 #####
prob <- predict(mylogit, mydata_rv306_v6_ADCP_wk26_overlap82, type="response")
pred=prediction(prob,mydata_rv306_v6_ADCP_wk26_overlap82$y)
perf=performance(pred,"tpr","fpr")
par(new=TRUE)
plot(perf, col="blue", xlab="", 
     ylab="",lwd=2)

auc=performance(pred,"auc")
auc=round(unlist(slot(auc,"y.values")),3) 
y_obs=mydata_rv306_v6_ADCP_wk26_overlap82$y
probpred=predict(mylogit, mydata_rv306_v6_ADCP_wk26_overlap82, type="response")
p=roc.area(y_obs,probpred)$p.value 
ci=ci.auc(mydata_rv306_v6_ADCP_wk26_overlap82$y,mydata_rv306_v6_ADCP_wk26_overlap82$trans_82)
legend(0.5, 0.2, c(paste("Test (Week 2)"), paste("AUC = ",round(auc,2))), cex = 0.9,text.col = "blue", bty = "n")

The model using GES obtained from the 82 genes was also able to predict ADCP responses measured 2 weeks after vaccination with an accuracy of 0.71. The ROC curve illustrates the discriminating ability of the classifier from the day 3 training data set (AUC=0.81, 95% CI: 0.62–1, p=0.007) and the week 2 testing data set (AUC=0.75, 95% CI: 0.53–0.97, p=0.02) to predict ADCP responses.

# Determining overlapping genes between timepoints and creating heatmap
# Reading in gene expression matrices
v6_data = read.table(file = "data/RV306_v6_forADCP_GSEA.txt", header = T, sep = "\t")
v5B_data = read.table(file = "data/RV306_v5B_forADCP_GSEA.txt", header = T, sep = "\t")

enriched_genes = read_xlsx("data/Gene_lists.xlsx", sheet = 1)

names(enriched_genes)[c(5,6)] = c("Day 3", "Week 2")
enriched_gene_lists = as.list(enriched_genes)

# Removing NA from Day 3 list, caused due to difference in number of row
enriched_gene_lists$`Day 3` = na.omit(enriched_gene_lists$`Day 3`)


overlap_genes = as.data.frame(intersect(enriched_genes$`Day 3`, enriched_genes$`Week 2`))
names(overlap_genes)[1] = "Overlapping_genes"

v6_data = v6_data[-c(1)]
names(v6_data)[1] = "Gene_name"
v6_data = right_join(v6_data, overlap_genes, by = c("Gene_name" = "Overlapping_genes"))
rownames(v6_data) = v6_data$Gene_name
v6_data[1] = NULL

v5B_data = v5B_data[-c(1)]
names(v5B_data)[1] = "Gene_name"
v5B_data = right_join(v5B_data, overlap_genes, by = c("Gene_name" = "Overlapping_genes"))
rownames(v5B_data) = v5B_data$Gene_name
v5B_data[1] = NULL

# Reading in metadata containing sample classification information
metadata = read_xlsx("data/Gene_lists.xlsx", sheet = 2)
meta_v5B = subset(metadata, Timepoint == "v5B")
meta_v6 = subset(metadata, Timepoint == "v6")

# Subsetting samples only to those with ADCP
v5B_data = v5B_data %>% dplyr::select(meta_v5B$Sample_ID)
v6_data = v6_data %>% dplyr::select(meta_v6$Sample_ID)

combined_set = cbind(v5B_data, v6_data)

# Calculating mean of expression values for each category

combined_set$Day_3_ADCP_High = rowMeans(combined_set[,1:12])
combined_set$Day_3_ADCP_Low = rowMeans(combined_set[,13:21])
combined_set$Week_2_ADCP_High = rowMeans(combined_set[,22:35])
combined_set$Week_2_ADCP_Low = rowMeans(combined_set[,36:45])

average_counts = as.matrix(combined_set[-c(1:45)])


### Using this matrix to create heatmap


# Setting a color scale
orig_col_scale = colorRampPalette(c("white", "orange", "#721427"))(n = 299)

# Created a dataframe categorizing matrix columns by group
# Used for color annotation

color_df = read.table(file = "data/Color_DF.txt", header = T, sep = "\t", row.names = "Name")

# Assigning colors to categories
annot_colors = list(Timepoint = c("Day 3" = "#12752C", "Week 2" = "#F58142"),
                 ADCP = c("High" = "#882B8D", "Low" = "#5AC4F7"))

# Plotting the heatmap
p1 = pheatmap(average_counts, color = orig_col_scale, scale = "row", 
              cluster_rows = T, cluster_cols = F, clustering_method = "complete",
              cellheight=8,cellwidth=60,
              annotation_col = color_df, annotation_colors = annot_colors,
              annotation_legend = T, annotation_names_col = T, border_color = NA,
              show_colnames = F, fontsize_row = 7, display_numbers = F,
              labels_row = as.expression(lapply(rownames(average_counts),
                                                function(a) bquote(italic(.(a))))))

p2 = ggplotify::as.ggplot(p1) + annotate("text", x = 0.72, y = 0.985, label = "Z-score", fontface = 2, size = 3.5) + theme(panel.background = element_rect(fill = "white", linetype = "blank"))

p2

Heatmap showing the hierarchical clustering of gene expression of the 82 genes (day 3 and week 2 after 4th vaccination) when stratified by magnitude of ADCP responses measured 2 weeks after the 4th immunization.

Pathways and genes shared between ADCP and vaccine protection phenotypes

These findings demonstrate a strong link of the geneset with both vaccine protection and ADCP in NHP and human studies. We sought to broaden our understanding of the relationship between the different enriched genes in the geneset and establish some of the top pathways with gene membership from the different studies. Networks and associated pathways from genes that were significantly enriched with either the ADCP or infection phenotypes from the 09–11, 13–19, RV144, and RV306 studies were determined using GeneMANIA. The top pathways were the immune system, innate immune system, H1F1 TF pathway, hypoxia, TNFA signaling via NFKB, cytokine signaling, inflammation response, signaling by interleukins, and IL-10 signaling (Figure 5A). The genes with the most connections were TYROBP, FPR1, CD14, CCR1, TNFRSF1B, CD68, CD63, CEBPD, and LST1. Clustering on the enriched genes to identify highly interconnected regions in the GeneMANIA network showed that TYROBP, FPR1, CD14, TNFRSF1B, CD68, and LST1 were all members of the cluster with the greatest number of genes (Figure 5B). There were no specific enriched genes that were common to all studies (Supplementary file 1b). Pathway enrichment analysis of the 63 genes in the RV144 signature revealed that the top non-redundant enriched clusters with gene membership were myeloid leukocyte activation, lysosome, and cellular response to oxidative stress genes (Figure 5C).

Pathway analyses of the enriched genes in the different vaccine studies.

A meta-analysis of pathways including enriched genes with reduced infection or higher ADCP was performed. (A) Genes that were enriched in at least one of the nine ADCP or infection status analyses (178) were used as input for GeneMANIA in Cytoscape. The connections between the different genes and the top MSigDB and Reactome pathways are shown. Each gene is represented by a circle and size is proportional to the number of connections with other genes or pathways. The color of each node indicates the enrichment status in the different studies. (B) Clustering of the enriched genes from the different studies. The color of each node represents the membership in a cluster and size is proportional to the number of connections with other genes or pathways. (C) Pathway enrichment analysis results of the 63 enriched genes that associated with reduced infection in the RV144 study. ADCP, antibody-dependent cellular phagocytosis.

Cellular origin of the protective genes by single-cell transcriptomics

To dissect the cellular origin of these genes, we performed simultaneous detection of mRNA and cell surface expression from single cells using the cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) technology in a subset of the vaccinated RV306 participants (Figure 6A). This technology allows simultaneous detection of cell surface markers and mRNA gene expression from the same single cells. Our analyses revealed that a majority of the genes in the RV144 signature were expressed in cells of the myeloid lineage, with monocyte subsets having the highest average gene expression (Figure 6B). A subset of 32 genes were also significantly associated with decreased risk of acquisition in a univariate analysis (odds ratio [OR]<1.0, p<0.05, q<0.1) (Figure 6C). A GES of the 32 significant genes is also associated with decreased risk of acquisition, increased VE, and was able to predict infection status in RV144 (Figure 6—figure supplement 1A-D). A stepwise logistic regression analysis identified specific genes (SEMA4A, SLC36A1, SERINC5, IL17RA, CTSD, CD68, and GAA) to have independent associations with reduced risk of acquisition and was mainly expressed in the monocyte compartment (Figure 6D). CD14+ monocytes also had the greatest number of DEGs that were associated with ADCP, which was not dependent on the frequency of the cell subset (Figure 6E, Figure 6—figure supplement 2).

Cellular origin of the RV144 signature.

Single-cell CITE-seq in vaccinated participants (N=12) who received the RV144 vaccine regimen (day 3 after last vaccination) identified expression of the genes in the signature in cells from the myeloid lineage. (A) Clustering based on cell surface expression of CITE-seq data. (B) Heatmap of the mRNA expression of the 63 genes from the RV144 signature from single cells. Columns represent single cells from different protein cell subsets and rows the mRNA gene expression. (C) Radar plot showing significant genes in the signature that associated with decreased risk of infection in RV144 (p<0.05, q<0.1) (N=170). (D) Feature plots of the expression of the most protective genes show that SEMA4A, IL17RA, CTSD, CD68, and GAA were mainly expressed in monocytes. (E) CD14+ monocytes had the highest number of differentially expressed genes (DEGs) when comparing high versus low ADCP (2 weeks after vaccination) from single-cell CITE-seq vaccinated participants who received the RV144 vaccine regimen (day 3 after last vaccination). ADCP, antibody-dependent cellular phagocytosis.

GES of the most significant genes is a correlate of reduced risk of infection in RV144.

A GES of the 32 enriched genes that were also significantly associated with decreased risk of acquisition in a univariate analysis in the RV144 study was examined as a continuous variable (N=170). (A) GES is associated with lower odds of HIV acquisition compared to the other two primary correlates of risk. Variables were measured at week 26, 2 weeks post last vaccination. For each variable, the OR is reported per 1-SD increase. Transcriptome data was available only in a subset of the 246 donors. (B) Probability of acquiring HIV-1 is lower in individuals with higher GES. (C) Vaccine efficacy is increased significantly in individuals with high GES. (D) Distribution of AUC and accuracy plotted after repeating the process 1000 times showed that GES could predict HIV-1 infection with AUC of 0.69±0.08 and with accuracy of 0.81±0.04. GES, gene expression score; OR, odds ratio.

Frequencies of cell subsets do not differ between ADCP high and low samples.

Frequencies of each cell subset were calculated for each sample. For the cell subsets with average frequencies >1 % of the sample, there were no significant differences between the frequencies of the cell subsets in the ADCP high versus low samples. ADCP, antibody-dependent cellular phagocytosis.

Discussion

Though an effective vaccine has been a challenge for the HIV field, we see a glimpse of optimism in partially protective NHP and human studies 8Barouch et al.20157Barouch et al.20139Barouch et al.201843Rerks-Ngarm et al.200956Vaccari et al.201855Vaccari et al.2016. These studies provide a unique opportunity to identify correlates of reduced risk that could help inform protective signals and enable the design of enhanced vaccine strategies. Targeted and unbiased approaches have implicated non-neutralizing antibodies as the major correlate of reduced risk of HIV infection 8Barouch et al.20157Barouch et al.20139Barouch et al.201824Haynes et al.201256Vaccari et al.201855Vaccari et al.2016. We previously showed that a transcriptomic signature first identified in sorted B cells at time points prior to challenge was a correlate of protection in two NHP studies after administration of the Ad26/gp140 vaccine. This signature is also associated with the increased magnitude of ADCP in the vaccinated monkeys 16Ehrenberg et al.2019. Additionally, we identified this signature in bulk PBMC from other studies that used the ALVAC/protein regimen, suggesting that the gene signature might be an indicator of effective vaccination. In this report, we further investigated this gene signature to answer the following questions, including: (1) can the gene signature’s association with protection be substantiated in additional human efficacy trials, (2) does it associate with ADCP in human trials, and (3) what is the cellular origin of the signature at the single-cell level?

The gene signature was previously associated with HIV vaccine protection in a number of studies with partial protection 16Ehrenberg et al.2019. We hypothesized that if this geneset was a true marker of HIV vaccine protection, it would not be enriched in a failed vaccine trial. HVTN 505 is a DNA based vaccine which, despite not showing overall efficacy in a Phase 2b trial, demonstrated both cellular and antibody effector mediated protection in specific subgroups of individuals in follow-up studies 18Fong et al.201825Janes et al.201734Neidich et al.2019. We performed transcriptomics on sorted cell subsets from HVTN 505 vaccinated individuals and did not observe enrichment of the gene signature, further strengthening our notion that the geneset could be a proxy for vaccine protection. Given that this signature is derived by comparing infected versus uninfected vaccinated groups, this analysis was only tested in studies with partial protection, and not in vaccine regimens where all recipients were infected.

Next, we developed a method to assess this gene signature compared to other correlates of risk in the human RV144 study. This method employs an analytical method using a GES which is computational score generated from the average expression of all genes enriched in the signature and associating with a phenotype. This method was tested across different NHP studies and RV144 and showed consistent association with reduced risk of infection based on the study-specific GES. The composite GES computed from RV144 consisting of the standardized expression of 63 genes had the strongest association with decreased risk of infection and increased vaccine efficacy. The RV144 GES was also able to accurately predict infection status in the study. This study shows that the GES composite score provides a robust analytical measurement to explore the effect of genes as a continuous variable in immune-correlates analyses, and that it could be applied to other ongoing efficacy studies. Further, the analysis of RV144 breakthrough infections was consistent with GES being lower in the vaccinated infected participants compared to placebos and hence protective. These observations, albeit only significant in the group infected with single founder viruses, strengthen the premise of the RV144 GES being a correlate of reduced risk of infection.

We previously showed in NHP challenge studies that the gene signature correlated with an increased magnitude of functional antibody responses 16Ehrenberg et al.2019. Although this geneset is associated with ADCP in NHP, the same analyses were previously not possible in the human RV144 study since this immune response was not reported 24Haynes et al.2012. ADCP has since been implicated with vaccine protection in a number of NHP challenge studies 2Ackerman et al.20188Barouch et al.20157Barouch et al.20139Barouch et al.201811Bradley et al.201734Neidich et al.2019. It is reported that ADCP could be involved in most studies that previously showed antibody-dependent correlates of protection against viruses 54Tay et al.2019. To investigate the effect of the gene signature on the magnitude of ADCP, we performed transcriptomics in samples from a human trial (RV306) that employed the same RV144 regimen. At both day 3 and 2 weeks after the 4th vaccine corresponding to the last RV144 vaccine dose, this signature was associated with an increased magnitude of ADCP responses. A strong correlation was also observed between GES from the ADCP enriched genes and the vaccine protection genes in RV144. The ADCP GES at both day 3 and week 2 after the last vaccination was able to predict ADCP magnitude at peak immunity. Similarly, a GES of the 82 overlapping genes between the two time points was also able to predict ADCP magnitude. This finding would suggest that measuring gene expression 3 days after vaccination in preclinical and clinical trials may be used as a tool for titrating effective responses. Although the 82 genes did not have membership in a previously described Fc-gamma receptor signaling pathway, a GO query identified SIRPA, SIRPB1, RAB20, and TYROBP in the phagocytosis GO term 51Swanson and Hoppe2004 (http://amigo.geneontology.org/amigo/term/GO:0006909#display-lineage-tab). These four genes were also connected to an additional 44 genes from the total 178 genes enriched in at least one of our GSEA analyses, supporting a possible role in phagocytosis (Supplementary file 1d). SIRPA and SIRPB are signal regulatory proteins that are expressed on myeloid cells, and the former is known to bind to CD47 to regulate migration and phagocytosis 6Barclay2009. RAB20 is a member of the Rab GTPase family and is involved in phagosome maturation 46Seto et al.2011. Finally, TYROBP had the maximum connections in a network of these genes, was present in the largest cluster, and was related to other highly connected genes expressed in effector cells such as monocytes or neutrophils. TYROBP (DAP12) is a cytosolic adaptor that associates with triggering receptors expressed on myeloid cells (TREMs) to promote phagocytosis 38N’Diaye et al.2009. Genes such as TNFSF13 and BHLHE40 that were previously identified as protective against SIV/SHIV acquisition in the NHP model were also present in the same cluster, supporting a similar function 16Ehrenberg et al.2019. Taken together, these findings provide evidence supporting the antibody-mediated effector function as a potential mechanistic basis of this protective gene signature.

The enriched genes from ADCP and infection risk in multiple studies of both NHP and human were involved in overlapping functions related to leukocyte activation, lysosomal degradation, and immune stimulation by cytokines. The 63 genes from the RV144 signature had the highest membership in the myeloid leukocyte activation pathway, perhaps alluding to the cellular origin of this signature. The specific genes in the geneset that associated with the greatest odds of reduced risk of infection including SEMA4A, CTSD, CD68, and GAA were all members of this pathway, but not TNFSF13 (APRIL) which was the most protective gene in the NHP studies. Although the geneset of interest was first seen in sorted B cells from vaccinated NHP, it was subsequently identified in transcriptomic data from PBMC in the RV144 study 16Ehrenberg et al.2019. While samples were exhausted from the RV144 primary data set, the RV306 clinical trial that employed the same ALVAC/protein vaccine regimen gave us a unique chance to explore the cellular origin of the RV144 signature using single-cell transcriptomics. Single-cell surface expression data revealed that the majority of genes were expressed in monocytes, which was not surprising given the fact that this geneset was originally defined as genes downregulated in B cells compared to monocytes after influenza vaccination 33Nakaya et al.2011. While our initial study found this signature in sorted B cells from the NHP challenge studies, single-cell data provides further insight that monocytes could be the cellular origin of these genes in the RV144 study. Although monocytes were classified as mononuclear phagocytes almost 50 years ago, assays designed to specifically measure monocyte ADCP were not widely used in the context of vaccination until a few years ago 57Furth et al.1972. While monocytes have been implicated in vaccine-induced protection in preclinical vaccine trials of SIV challenge, our findings in human trials at the single-cell level provide greater impetus to explore the role of other non-lymphoid cell populations on HIV-1 VE 21Gorini et al.202056Vaccari et al.2018. Though we think that monocytes are important in the vaccine responses observed in RV144, it would be remiss not to mention that the effect of granulocytes (including neutrophils) in response to vaccination is missed when transcriptomics is performed in PBMC compared to blood. Regardless of the cellular origin, we think this set of 200 genes with a coordinated expression may not be specific to a cell type, but might mark a certain biological state, such as response to a cytokine, and can be identified even in PBMC and blood samples. Other than the phagocytic cell, antibody and Fc receptor diversity can also influence ADCP mediated immune responses to viral pathogens and are elements that warrant further study and may potentially be manipulated to improve VE 14Chung and Alter201720Geraghty et al.201954Tay et al.2019.

Our data demonstrate the potential to discover novel protective correlates using an approach that mines transcriptomic data in multiple preclinical and clinical trials. Unbiased transcriptome-wide analyses are able to identify biological perturbations that associate with vaccine protection even when differences are small, but credibility can only be strengthened by replicating findings across multiple studies. Gene signatures that associate not only with vaccine protection but with specific immune responses can be a prospective tool to evaluate vaccine effectiveness even prior to challenge or infection. Developing analytical tools that can interface with phenotypes such as vaccine protection across human and preclinical studies can allow for more systematic meta-analyses of data emerging from the ongoing, non-efficacious, or halted HIV vaccine clinical trials 22Gray et al.202136NIH202037NIH2021. We propose that assessment of such gene signatures with immune responses in human immunogenicity trials could provide orthogonal insight for down-selection of vaccine candidates. Identifying overlapping immune correlates could be pivotal to making discoveries that may allow for licensure and subsequent bridging studies of an effective HIV vaccine.

Materials and methods

Study design

The aim of the study was post hoc analyses of a protective gene expression signature identified previously in five SIV/HIV vaccine studies with efficacy and immune response data 16Ehrenberg et al.2019. To enable interpretation of this gene signature, bulk RNA-seq, scRNA-seq, and functional data were generated in clinical samples from the RV306 and HVTN 505 human trials. The RV306 vaccine trial was conducted in Thailand and all participants received the primary RV144 ALVAC/gp120 vaccine series, with additional late boosts assigned to specific groups 40Pitisuttithum et al.2020. Bulk RNA-seq was performed in 24 participants 2 weeks after the RV144 vaccine regimen (week 26). Additionally, RNA-seq was also performed 3 days after the same primary endpoint. The HVTN 505 trial used a DNA/rAd5 vaccine regimen to test safety and efficacy in a US population 23Hammer et al.2013. PBMC collected 1 month after the final immunization (month 7) was available from 47 vaccines in the HVTN 505 study for RNA-seq 23Hammer et al.2013. The infection status of the vaccinees (22 cases and 25 controls) was categorized based on infection status between months 7 and 24. Microarray transcriptome data from PBMC and immune response data for 170 vaccinated individuals from the RV144 study at time point 2 weeks post last vaccination was used for correlates analyses 19Fourati et al.201924Haynes et al.2012. All studies were approved by the participating local and international institution review boards. Informed consent was obtained from all participants in the different trials included in this study 23Hammer et al.201340Pitisuttithum et al.2020.

Bulk transcriptomics

RNA was extracted from sorted B cells (Aqua live/deadCD20+CD3) and monocytes (Aqua live/deadCD20CD3CD56HLA-DR+CD14+) from PBMC of HVTN 505 vaccinees using RNAzolRT (MRC Inc) as per recommendations from the manufacturer. For the preparation of mRNA libraries, polyadenylated transcripts were purified on oligo-dT magnetic beads, fragmented, reverse transcribed using random hexamers, and incorporated into barcoded cDNA libraries based on the Illumina TruSeq platform. Next, libraries were validated by electrophoresis, quantified, pooled, and clustered on Illumina TruSeq v2 flow cells. Clustered flow cells were sequenced on an Illumina HiSeq (2000/4000) using 2×75 base paired-end runs. Total RNA from RV306 participants was extracted from whole blood collected in PAXgene Blood RNA tubes using associated RNA extraction (both QIAGEN; Germantown, MD) and GlobinClear purification kits (Thermo Fisher Scientific; Waltham, MA) as per the manufacturer’s suggestions. RNA-seq was performed using the SMART-Seq technology 39Picelli et al.201442Ramsköld et al.2012. Briefly, cDNA was generated from 10 ng of RNA using the SMART-Seq v4 UltraLow Input RNA Prep Kit (Takara Bio Inc) as per the manufacturer’s suggestions, with control RNA spiked-in (Thermo Fisher Scientific). Sequencing libraries were generated using the Nextera XT DNA Sample Prep Kit (Illumina, San Diego, CA). Concentration of each sample in the pooled libraries was determined using the paired-end 300-cycle MiSeq Reagent Nano Kit v2 (2×150 bp) on a MiSeq instrument (both Illumina). Next-generation sequencing was performed on a final adjusted library pool using the paired-end 300-cycle NovaSeq 6000 S2 XP Reagent Kit (2×150 bp) on a NovaSeq instrument (both Illumina) as per the manufacturer’s instructions. Fastp v0.19.7 and Trimmomatic v0.33 with default parameters were used to trim low-quality bases from both ends of each read 10Bolger et al.201413Chen et al.2018. Trimmed reads were aligned to the human genome (GRCh38 build 88–92) using HISAT2 v2.1.0 or the STAR aligner (v2.4.2a) and HTSeq (v0.6.1–0.9.1) was used for counting 15Dobin et al.201326Kim et al.20153Anders et al.2015. Trimmed mean of M-values normalization method, as implemented in the R package edgeR, was used for normalization 44Robinson et al.2010.

Single-cell transcriptomics

Simultaneous evaluation of mRNA and cell surface expression from single cells was performed using feature barcoding (FB) technology from 10× Genomics, based on the CITE-seq technology 47Stoeckius et al.2017. Cell hashing (HTO) was used in conjunction with the 10× Genomics 5’V(D)J Feature Barcoding Kit to generate single-cell mRNA gene expression (GEX) and antibody-derived tag (ADT) libraries 47Stoeckius et al.201748Stoeckius et al.2018. Briefly, PBMC from 12 samples were hashed using TotalSeq-C anti-human Hashtag antibodies and combined into two batches. In each batch, surface proteins were stained with a cocktail of 53 TotalSeq-C antibodies (BioLegend). Antibody concentrations were either predetermined by titration 27Kotliarov et al.2020 or used at a default concentration. 50,000 cells from each batch were loaded onto each of four wells of a Chromium chip, and GEX and ADT (HTO and FB) libraries were constructed following the manufacturer’s protocol. Libraries were pooled and quantitated using a MiSeq Nano v2 reagent cartridge. Final libraries were sequenced on the NovaSeq 6000, S4 reagent cartridge (2×100 bp) (Illumina).

CITE-seq data analyses

FASTQ files were demultiplexed with bcl2fastq v2.20 (Illumina). Alignment and counting were performed using Cell Ranger v3.1.0 (10× Genomics) and the human reference files provided by 10× Genomics (human genome GRCh38 and Ensembl annotation v93). The average number of genes per cell was 1453 and the average number of unique molecular identifiers was 4248. The mean read depth per cell was approximately 65,000–84,000. The minimum fraction of reads mapped to the genome was 88% and sequencing saturation was above 85% for all lanes, with an average of 88%. The computational analysis of ADT data was performed using the Seurat v3.1 package 49Stuart et al.2019. HTO expression matrices were CLR (Centered Log-Ratio) normalized and demultiplexed using MULTIseqDemux. The FB matrices from the Seurat objects were split into cell-positive and negative droplet matrices using the HTO demultiplexing results, and were used for DSB (Denoised and Scaled by Background) normalization 27Kotliarov et al.2020 (https://cran.r-project.org/web/packages/dsb/index.html). Only cells with <10% mitochondrial genes were retained, and cells were assigned to specific donors using the HTO demultiplexing results. A total of 53,777 single cells remained after the quality control process. The gene expression matrices for all samples were normalized and integrated into a single object in Seurat 49Stuart et al.2019. Based on the workflow described in Kotliarov et al., a distance matrix was generated from cell surface protein features 27Kotliarov et al.2020. This matrix was used for shared-nearest-neighbor finding and clustering at resolution=0.5. Neighbor finding and clustering were performed on the integrated gene expression data at a resolution=0.75 and dimensions=1:30. A tSNE (t-distributed stochastic neighbor embedding) was generated from the protein data PCA. Seurat was used to generate a heatmap, dotplot, and featureplots. Differential gene expression testing was performed within each cluster between the high and low ADCP groups using Seurat’s FindMarkers function. ADCP DEG was filtered to genes with >10% expression in either group, a log fold change >0.25, and a Bonferroni p<0.05.

ADCP assay

The antibody effector function ADCP was measured as previously described 1Ackerman et al.201153Tay et al.201952Tay et al.2016. Briefly, A244 gp120 Env-coated fluorescent beads were incubated at 37°C for 2 hr with diluted plasma (1:50) collected at week 26, 2 weeks after administration of the RV144 vaccination series. Anti-CD4 monoclonal antibody-treated THP-1 cells (human monocytic cell line; ATCC TIB-202) (treated for 15 min at 4°C) were added to immune complexes and spinoculated for 1 hr at 4°C to allow phagocytosis to occur. Supernatant was removed, cells were washed, and fixed in paraformaldehyde. Phagocytosis was measured by flow cytometry and a phagocytosis score was calculated as follows: phagocytosis score=(% pos*MFI of Sample)/(% pos*MFI of no-antibody PBS control). The HIV-1 CD4 binding-site broadly neutralizing antibody (bnAb), CH31, was used as a positive control, and the influenza receptor binding site-specific bnAb, CH65, was used as a negative control. Results are representative of two independent experiments.

Pathway analyses

Association of the protective gene signature with infection (HVTN 505) or magnitude of median ADCP (RV306) responses were analyzed using the Gene Set Enrichment Analysis (GSEA) method as described previously 16Ehrenberg et al.201950Subramanian et al.2005. GSEA was performed on vaccinated HVTN 505 participants at the visit seven time points, 1 month after the last immunization. RNA-seq was performed on samples prior to infection, but participants were categorized based on their infection status. GSEA was performed on 45 RV306 RNA-seq samples that also had ADCP scores obtained at the week 26 (week 2 after the 4th vaccination) time point. Participants were categorized into high and low ADCP groups based on the median values of ADCP measured in a total of 79 vaccinated participants. The RNA-seq gene expression values at the day 3 and week 2 time points were then analyzed for gene enrichment using a gene set of 200 genes, obtained from the Broad Institute (GSE29618_BCELL_VS_MONOCYTE_DAY7_FLU_VACCINE_DN), between the two groups of samples. The gene signature of interest was considered significantly enriched using a threshold of NES≥1.4 and p<0.001 as described previously 16Ehrenberg et al.2019. The 178 genes enriched with ADCP or infection status in any of the nine analyses, as well as the 82 genes overlapping in enrichment between the two RV306 time points, were used as search terms in GeneMANIA in the Cytoscape software 31Montojo et al.201058Warde-Farley et al.2010. We selected connections such as co-expression, co-localization, pathway, and physical interactions, as well as Reactome and MSigDB for Attributes. Zero additional genes and up to 10 additional attributes were found with GO biological process-based weighting. The genes in the nine-analysis network were clustered further using the MCODE algorithm in the clusterMaker2 Cytoscape plugin with default settings 32Morris et al.2011. Pathway enrichment analysis of the 63 genes enriched in the RV144 infection analysis was performed using Metascape with default parameters, database v20210201 59Zhou et al.2019.

Correlates of protection

Composite GES was computed as the average of standardized expression of normalized enriched genes in the gene signature in different vaccine studies. The samples in each vaccine study were grouped into outcomes after challenge or infection status after immunization 8Barouch et al.20159Barouch et al.201843Rerks-Ngarm et al.2009. Logistic regression was used for evaluating the association between GES and HIV-1 infection in the RV144 study. The fitting methods accommodate the two-phase sampling design via maximum likelihood estimation 12Breslow and Holubkov1997. Cumulative HIV-1 incidence curves were plotted for the three subgroups of vaccine recipients defined by tertiles into the lower, middle, and upper third of the GES (Low, Medium, and High subgroups), as well as for the entire placebo group HIV negative at week 24 (N=6267 subjects) for reference. These curves were estimated using the Kaplan-Meier method with inverse probability weighting that accounted for the sampling design. Next, VE for the GES subgroups versus the entire placebo group was estimated as one minus the odds of infection in vaccine recipients with Low/Medium/High response divided by the odds of infection in the entire placebo group HIV-1 negative at week 24 of enrollment in the study. The RV144 prediction analysis was implemented by logistic regression. The data set was randomly split into training and testing sets in a 7:3 ratio, while retaining class distributions within the groups. The training data set consisted of 119 individuals while the test data set consisted of 51 individuals. A logistic regression of GES was fit on to the training data set 41Prentice et al.2015. The model’s discriminative ability was evaluated by generating a ROC curve and the corresponding AUC on the test data set. The prediction accuracy of the model was also assessed on the test data set. The probability that gives minimum misclassification error was chosen as the cutoff. This process was repeated 1000 times and the distribution of the resulting AUC and accuracy were demonstrated by a histogram with a density curve. Similar analysis was performed using a GES computed from the 32 genes that were significantly associated with HIV acquisition.

Among 121 RV144 participants who became infected during the trial and had their HIV-1 genome sequenced at diagnosis, 43 had GES measurements computed from microarray data 19Fourati et al.201945Rolland et al.2012. Vaccine and placebo groups were compared overall and after stratifying infections with single HIV-1 founders.

Other statistical analyses

Logistic regression that accounted for the sampling design was used for the univariate analyses of the 63 enriched genes. A radar plot of the significant genes was generated to illustrate OR and 95 % CIs. All ORs were reported per 1-SD increase. Significant genes resulted from univariate logistic regressions of the 63 enriched genes were further analyzed with a multivariate stepwise logistic regression to identify genes that were independently associated with HIV protection. Akaike information criterion was used to identify the optimal set of genes. The expressed enriched genes associated with higher magnitude of ADCP in RV306 at day 3 and 2 weeks post the RV144 vaccine regimen were used to compute the ADCP GES in RV144. Spearman correlation was calculated between the ADCP GES from the two time points and the infection GES, respectively.

For prediction analyses, a GES was computed using the 93 genes enriched at the day 3 time point associating with magnitude of ADCP in RV306. The performance of the classifier was assessed using AUC with 95% CI. This model was then tested at the week 2 time point. The prediction accuracy of the model was also assessed on the week 2 test data set. The probability that gives minimum misclassification error was chosen as the cutoff. Similar analysis was done using the 82 genes overlapping between the 118 enriched genes from week 2 and the 93 enriched genes from day 3 post the 4th RV144 vaccination series that were associated with ADCP in RV306. The average expression values for 82 overlapping genes in RV306 at the day 3 and week 2 post vaccination time points were stratified by ADCP scores to generate a heatmap with the R package pheatmap. Values for each gene were scaled and the resulting z-scores were hierarchically clustered using the ‘complete’ method.

All descriptive and inferential statistical analyses were performed using GraphPad Prism 8 (GraphPad Software) and R 3.6.1 (or later) software packages. Comparison of groups was performed using Mann-Whitney tests or t-tests when assumptions were met. All logistic regression models were adjusted for gender and baseline risk behavior and one significant principal component axis 24Haynes et al.201241Prentice et al.2015. A two-sided p-value of less than 0.05 was considered significant. The Benjamini and Hochberg method was used to calculate false discovery rate-adjusted p-values for multiple testing corrections.

References

    A robust, high-throughput assay to determine the phagocytic activity of clinical antibody samples366Journal of Immunological Methods819
    Route of immunization defines multiple mechanisms of vaccine-mediated protection against SIV24Nature Medicine15901598
    HTSeq–a Python framework to work with high-throughput sequencing data31Bioinformatics166169
    Potential impact of low efficacy HIV-1 vaccines in populations with high rates of infection261Proceedings. Biological Sciences147151
    Predicting the impact of a partially effective HIV vaccine and subsequent risk behavior change on the heterosexual HIV epidemic in low- and middle-income countries: A south african example46Journal of Acquired Immune Deficiency Syndromes7890
    Signal regulatory protein alpha (SIRPalpha)/CD47 interaction and function21Current Opinion in Immunology4752
    Protective efficacy of a global HIV-1 mosaic vaccine against heterologous SHIV challenges in rhesus monkeys155Cell531539
    Protective efficacy of adenovirus/protein vaccines against SIV challenges in rhesus monkeys349Science320324
    Evaluation of a mosaic HIV-1 vaccine in a multicentre, randomised, double-blind, placebo-controlled, phase 1/2a clinical trial (APPROACH) and in rhesus monkeys (NHP 13-19)392Lancet232243
    Trimmomatic: a flexible trimmer for Illumina sequence data30Bioinformatics21142120
    Pentavalent HIV-1 vaccine protects against simian-human immunodeficiency virus challenge8Nature Communications
    Weighted likelihood, pseudo-likelihood and maximum likelihood methods for logistic regression analysis of two-stage data16Statistics in Medicine103116
    fastp: an ultra-fast all-in-one FASTQ preprocessor34Bioinformatics
    Systems Serology: Profiling vaccine induced humoral immunity against HIV14Retrovirology
    STAR: ultrafast universal RNA-seq aligner29Bioinformatics1521
    A vaccine-induced gene expression signature correlates with protection against SIV and HIV in multiple trials11Science Translational Medicine
    An HIV vaccine is essential for ending the HIV/AIDS pandemic318JAMA15351536
    Modification of the association between t-cell immune responses and human immunodeficiency virus type 1 infection risk by vaccine-induced antibody responses in the HVTN 505 trial217The Journal of Infectious Diseases12801288
    Integrated systems approach defines the antiviral pathways conferring protection by the RV144 HIV vaccine10Nature Communications
    Effect of FC receptor genetic diversity on HIV-1 disease pathogenesis10Frontiers in Immunology
    Engagement of monocytes, NK cells, and CD4+ Th1 cells by ALVAC-SIV vaccination results in a decreased risk of SIVmac251 vaginal acquisition16PLOS Pathogens
    Vaccine efficacy of ALVAC-HIV and bivalent subtype C GP120-MF59 in adults384The New England Journal of Medicine10891100
    Efficacy trial of a DNA/rAd5 HIV-1 preventive vaccine369The New England Journal of Medicine20832092
    Immune-correlates analysis of an HIV-1 vaccine efficacy trial366The New England Journal of Medicine12751286
    Higher T-Cell Responses Induced by DNA/rAd5 HIV-1 Preventive Vaccine Are Associated With Lower HIV-1 Infection Risk in an Efficacy Trial215The Journal of Infectious Diseases13761385
    HISAT: a fast spliced aligner with low memory requirements12Nature Methods357360
    Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus26Nature Medicine618629
    Molecular signatures of antibody responses derived from a systems biology study of five human vaccines15Nature Immunology195204
    Effectiveness of UNAIDS targets and HIV vaccination across 127 countries114PNAS40174022
    “Mosaic” HIV vaccine to be tested in thousands of people across the world572Nature165166
    GeneMANIA Cytoscape plugin: fast gene function predictions on the desktop26Bioinformatics29272928
    Clustermaker: A multi-algorithm clustering plugin for cytoscape12BMC Bioinformatics
    Systems biology of vaccination for seasonal influenza in humans12Nature Immunology786795
    Antibody Fc effector functions and IgG3 associate with decreased HIV-1 risk129The Journal of Clinical Investigation48384849
    https://www.nih.gov/news-events/news-releases/nih-partners-launch-hiv-vaccine-efficacy-study
    https://www.nih.gov/news-events/news-releases/experimental-hiv-vaccine-regimen-ineffective-preventing-hiv
    https://www.nih.gov/news-events/news-releases/hiv-vaccine-candidate-does-not-sufficiently-protect-women-against-hiv-infection
    TREM-2 (triggering receptor expressed on myeloid cells 2) is a phagocytic receptor for bacteria184The Journal of Cell Biology215223
    Full-length RNA-seq from single cells using Smart-seq29Nature Protocols171181
    Late boosting of the RV144 regimen with AIDSVAX B/E and ALVAC-HIV in HIV-uninfected Thai volunteers: a double-blind, randomised controlled trial7The Lancet. HIV
    HLA class II genes modulate vaccine-induced antibody responses to affect HIV-1 acquisition7Science Translational Medicine
    Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells30Nature Biotechnology777782
    Vaccination with ALVAC and AIDSVAX to prevent hiv-1 infection in Thailand361The New England Journal of Medicine22092220
    edgeR: a Bioconductor package for differential expression analysis of digital gene expression data26Bioinformatics139140
    Increased HIV-1 vaccine efficacy against viruses with genetic signatures in Env V2490Nature417420
    Rab GTPases regulating phagosome maturation are differentially recruited to mycobacterial phagosomes12Traffic407420
    Simultaneous epitope and transcriptome measurement in single cells14Nature Methods865868
    Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics19Genome Biology
    Comprehensive Integration of Single-Cell Data177Cell18881902
    Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles102PNAS1554515550
    The coordination of signaling during FC receptor-mediated phagocytosis76Journal of Leukocyte Biology10931103
    Antibody-mediated internalization of infectious HIV-1 virions differs among antibody isotypes and subclasses12PLOS Pathogens
    Rare detection of antiviral functions of polyclonal iga isolated from plasma and breast milk compartments in women chronically infected with HIV-193Journal of Virology
    Antibody-dependent cellular phagocytosis in antiviral immune responses10Frontiers in Immunology
    Adjuvant-dependent innate and adaptive immune signatures of risk of SIVmac251 acquisition22Nature Medicine762770
    HIV vaccine candidate activation of hypoxia and the inflammasome in CD14(+) monocytes is associated with a decreased risk of SIVmac251 acquisition24Nature Medicine847856
    The mononuclear phagocyte system: A new classification of macrophages, monocytes, and their precursor cells46Bulletin of the World Health Organization845852
    The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function38Nucleic Acids Research
    Metascape provides a biologist-oriented resource for the analysis of systems-level datasets10Nature Communications