Abstract
Usually single drug does not yield good therapeutic effect in complex disease like cancer. To overcome the shortage of monotherapy such as lack of efficacy and drug resistance, now researchers are shifting from traditional “one gene, one drug, one disease” paradigm to looking for synergistic multi-drug combinations which can modulate multiple cellular process simultaneously. Traditional high throughput screening is impractical to search for effective multi-drug therapies regarding the huge volumes of possible combinations, alternative in silico approach that can predict drug synergism will greatly reduce the time and cost for experimental validations.
In order to encourage development of computational models for in silico drug screening, in 2012, the DREAM Challenge initiative in collaboration with the National Cancer Institute launched a community-based challenge for predicting compound pair synergism. The transcriptomic and dose response data of 14 individual drugs were generated to predict the synergistic effect of all the 91 possible drug pair combinations. Prediction performances were evaluated by the experimentally derived synergistic effect, and quantified using probabilistic concordance index (PC-index). In the challenge, the PC-index of the total 31 submission ranged from 0.61 to 0.42 (random guess has PC-index 0.5). The best performed method, Drug-Induced Genomic Residual Effect (DIGRE) model developed by our team has significantly higher PC-index (0.613), and much greater consistency comparing to the remaining methods.
Cite our papers:
“A community computational challenge to predict the activity of pairs of compounds.” Nature biotechnology 32.12 (2014): 1213-1222.
“DIGRE: Drug-Induced Genomic Residual Effect Model for Successful Prediction of Multidrug Effects.” CPT: pharmacometrics & systems pharmacology 4.2 (2015): 91-97.
The DIGRE model takes three forms of input data to predict the compound synergistic effect: the drug treated gene expression data, the drug dose response data and the gene-gene interaction information.
This is the gene expression profile (microarray or RNA-Seq) of cell after drug perturbation. User should prepare a dataframe with each column representing one drug, and each row representing one gene. Negative control such as DMSO treatment should be included. The demo data geneExp.demo
we provided in the package is from NCI-DREAM challenge competition for predicting drug pairs synergy. OCI-LY3 human diffuse large B-cell lymphoma (DLBCL) cell line was treated by 14 different drugs in its dose of IC20. 24 hours after Perturbation, gene expression level was measured.
library(DIGREsyn)
head(geneExp.demo)
#> V1 V2 V3 V4 V5 V6
#> 1 Genename Neg_control Neg_control Neg_control Neg_control Neg_control
#> 2 A1CF 3.91 3.68 3.96 3.81 4.44
#> 3 A1CF 4.56 4.85 4.43 4.61 4.36
#> 4 A2BP1 4.26 4.04 4.14 4.38 4.23
#> 5 A2BP1 4.9 5.04 4.74 4.82 5.2
#> 6 A2BP1 3.98 3.99 4.1 4.09 4.63
#> V7 V8 V9 V10 V11
#> 1 Neg_control Neg_control Neg_control Aclacinomycin A Aclacinomycin A
#> 2 4.51 5.13 4.41 3.81 3.92
#> 3 4.95 4.44 4.45 4.64 4.48
#> 4 3.97 4.71 4.26 4 4.08
#> 5 5.12 4.54 5.77 4.72 4.69
#> 6 3.92 4.62 4.18 4.05 4.12
#> V12 V13 V14 V15 V16
#> 1 Aclacinomycin A Blebbistatin Blebbistatin Blebbistatin Camptothecin
#> 2 3.94 3.76 3.69 4.24 3.9
#> 3 4.78 4.83 4.97 5.58 4.81
#> 4 3.95 4.22 4.03 4.31 4.2
#> 5 5.07 4.94 4.56 5.77 4.8
#> 6 4.19 4.12 4.08 4.37 4.16
#> V17 V18 V19 V20 V21
#> 1 Camptothecin Camptothecin Cycloheximide Cycloheximide Cycloheximide
#> 2 3.64 3.96 4.02 4.03 3.79
#> 3 4.8 4.7 4.81 4.81 4.68
#> 4 4.12 4.22 4.14 4.14 4
#> 5 5.01 5 4.83 4.81 4.75
#> 6 4.41 4.12 4.04 4.19 4.12
#> V22 V23
#> 1 Doxorubicin hydrochloride Doxorubicin hydrochloride
#> 2 3.91 4
#> 3 4.87 4.83
#> 4 4.45 4.25
#> 5 5.24 4.86
#> 6 4.3 4.43
#> V24 V25 V26 V27 V28
#> 1 Doxorubicin hydrochloride Etoposide Etoposide Etoposide Geldanamycin
#> 2 3.99 4.01 3.67 3.73 4.09
#> 3 4.56 4.41 4.78 4.63 4.77
#> 4 4.21 4.12 4.16 4.2 3.92
#> 5 4.99 4.85 5.09 4.78 4.76
#> 6 4.14 4.18 4.1 4.37 4.19
#> V29 V30 V31 V32
#> 1 Geldanamycin Geldanamycin H-7, Dihydrochloride H-7, Dihydrochloride
#> 2 3.85 4.24 6 5.74
#> 3 4.57 4.95 5.03 4.83
#> 4 4.22 4.22 4.34 4.68
#> 5 4.91 4.78 6.41 7.79
#> 6 4.24 4.18 4.17 6.3
#> V33 V34 V35 V36 V37
#> 1 H-7, Dihydrochloride Methotrexate Methotrexate Methotrexate Mitomycin C
#> 2 5.88 3.95 4.04 3.82 3.68
#> 3 4.47 4.66 4.57 4.53 4.66
#> 4 5.26 4.23 4.24 4.28 3.94
#> 5 6.08 4.79 4.67 5.22 4.77
#> 6 4.36 4.19 4.23 4.31 3.92
#> V38 V39 V40 V41 V42 V43
#> 1 Mitomycin C Mitomycin C Monastrol Monastrol Monastrol Rapamycin
#> 2 4.07 4.15 4.17 3.85 4.35 4.69
#> 3 5.52 4.74 5.56 4.59 4.28 4.94
#> 4 3.99 4.25 4.19 3.85 3.84 4.01
#> 5 4.75 4.99 5.66 4.5 5.32 4.77
#> 6 4.2 4.08 4.26 4.1 4.26 4.09
#> V44 V45 V46 V47 V48
#> 1 Rapamycin Rapamycin Trichostatin A Trichostatin A Trichostatin A
#> 2 4.53 5.14 4.05 4.58 4.39
#> 3 4.88 4.38 4.86 5.44 5.06
#> 4 4.19 4.33 4.21 4.42 4.25
#> 5 4.84 4.74 4.8 4.9 5.07
#> 6 4.37 4.48 4.19 4.5 3.97
#> V49 V50 V51
#> 1 Vincristine Vincristine Vincristine
#> 2 3.86 3.98 5.44
#> 3 4.51 4.46 5.52
#> 4 3.98 4.1 4.34
#> 5 4.74 4.75 5.57
#> 6 4.26 4.09 5.55
This the dose response data of each drug. User should prepare a dataframe that contains the cell viability reduction values when cell line is treated with drug in two different dose. One drug dose is the IC20 of the drug, therefore the cell viability reduction is always 0.2 for all drugs. The other dose is double dose of IC20, this value is infered from the dose response curve of each drug. (This step should done by user himself, DIGRE only accept the processed data not raw dose response curves; currently, the two drug doses are fixed: the IC20 and double of the IC20.) The demo data doseRes.demo
we provided in the package contains dose response data of 14 drugs from NCI-DREAM challenge. (Notice: Check the drug name in the dose response data that match exactly the drug name in the gene expression data.)
head(doseRes.demo)
#> Aclacinomycin A Blebbistatin Camptothecin Cycloheximide
#> IC20 0.2 0.20 0.20 0.200
#> 2*IC20 0.3 0.29 0.26 0.255
#> Doxorubicin hydrochloride Etoposide Geldanamycin
#> IC20 0.20 0.2 0.20
#> 2*IC20 0.32 0.3 0.27
#> H-7, Dihydrochloride Methotrexate Mitomycin C Monastrol Rapamycin
#> IC20 0.20 0.2 0.20 0.20 0.20
#> 2*IC20 0.45 0.2 0.33 0.42 0.71
#> Trichostatin A Vincristine
#> IC20 0.20 0.20
#> 2*IC20 0.39 0.23
This is the gene-gene interaction information need for DIGRE to predict drug interaction effect. User does not need to worry about this if he uses KEGG pathway in DIGREscore
function. But if user prefer to use their own knowledge about the gene network involved in the specific cell type, then he will need to prepare this gene-gene interaction table. It should be a dataframe that has two columns with gene SYMBOL names. Each raw represents two connected genes. The interaction is regarded as undirected. The demo data geneNetLymph
is a lymphoma-specific gene network refined from lymphoma patients gene expression data.
head(geneNetLymph)
#> Node.A Node.B
#> [1,] "AARS" "AKT1"
#> [2,] "AARS" "ARF1"
#> [3,] "AARS" "ATP2A2"
#> [4,] "AARS" "DDB1"
#> [5,] "AARS" "ESPL1"
#> [6,] "AARS" "HGS"
Here we show the basic pipeline of using DIGREsyn
to predict drug synergistic score with demo data.
1. Read and profile gene expression data
Read the gene expression data of cells treated with each individual compound, and parse it with the profileGeneExp
function. (Notice: Do not need to specify column names and row names by drug names and gene names for gene expression data as duplicated drug names and gene names are common.)
geneExpDiff <- profileGeneExp(geneExp = geneExp.demo)
#> Start parse drug treated gene expression data ...
#> ------------
#> Checking drug name ...
#> Duplicated drug name found, average duplicated data.
#> Drug list you provided:
#> 1. Aclacinomycin A
#> 2. Blebbistatin
#> 3. Camptothecin
#> 4. Cycloheximide
#> 5. Doxorubicin hydrochloride
#> 6. Etoposide
#> 7. Geldanamycin
#> 8. H-7, Dihydrochloride
#> 9. Methotrexate
#> 10. Mitomycin C
#> 11. Monastrol
#> 12. Rapamycin
#> 13. Trichostatin A
#> 14. Vincristine
#> Normalizing data ...
#> Measuring gene expression difference ...
#> Collapse multiple probes to genes ...
#> Done parsing drug treated gene expression data.
2. Read and parse gene interaction data (optional)
Read the gene connectivity data to construct the gene network that DIGRE uses to compare compound effects on upstream and downstream genes.
geneNetLymph.mat <- constGeneNet(geneNet = geneNetLymph)
#> Parsing gene network data ...
#> ------------
#> Total 1058 nodes, total 13248 connnection.
#> Done constructing gene network matrix.
3. Predict compound pair synergistic effect
DIGREscore
is the core function that predicts all the possible compound pair synergistic scores and their ranks. (Notice: for the following, we are using a default cut off of 0.6 for gene expression difference, but you can also set your own preferred value (from 0 to 1).)
pred.res <- DIGREscore(geneExpDiff = geneExpDiff, doseRes = doseRes.demo, pathway = "GeneNet", geneNet = geneNetLymph.mat, fold = 0.6)
#> Start scoring compound pairs by DIGRE model ...
#> ------------
#> Using self-constructed gene network information
#> Fold change cut off used: 0.6 (default:0.6)
#> Done scoring.
The DIGREvis
function is for visualizing prediction results. The heatmap displayes the predicted synergistic scores of all the compound pairs. Deep color indicates high synergistic score. The bar plot shows the top predicted synergistic drug pairs.
Heatmap
vis.heat <- DIGREvis(pred.pair = pred.res$scoreRank, type = "heat")
plot(vis.heat)
Bar plot
vis.bar <- DIGREvis(pred.pair = pred.res$scoreRank, type = "bar")
plot(vis.bar)
sessionInfo()
#> R version 3.4.3 (2017-11-30)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.3
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] DIGREsyn_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.16 pillar_1.2.1 compiler_3.4.3
#> [4] plyr_1.8.4 tools_3.4.3 digest_0.6.15
#> [7] bit_1.1-12 evaluate_0.10.1 RSQLite_2.0
#> [10] memoise_1.1.0 preprocessCore_1.40.0 tibble_1.4.2
#> [13] gtable_0.2.0 pkgconfig_2.0.1 rlang_0.2.0
#> [16] KEGGgraph_1.38.0 graph_1.56.0 rstudioapi_0.7
#> [19] DBI_0.8 commonmark_1.4 yaml_2.1.18
#> [22] parallel_3.4.3 knitr_1.20 withr_2.1.2
#> [25] stringr_1.3.0 roxygen2_6.0.1 xml2_1.2.0
#> [28] desc_1.1.1 S4Vectors_0.16.0 devtools_1.13.5
#> [31] IRanges_2.12.0 rprojroot_1.3-2 stats4_3.4.3
#> [34] bit64_0.9-7 grid_3.4.3 Biobase_2.38.0
#> [37] R6_2.2.2 AnnotationDbi_1.40.0 XML_3.98-1.10
#> [40] rmarkdown_1.9 ggplot2_2.2.1 blob_1.1.1
#> [43] org.Hs.eg.db_3.5.0 magrittr_1.5 htmltools_0.3.6
#> [46] backports_1.1.2 scales_0.5.0 BiocGenerics_0.24.0
#> [49] assertthat_0.2.0 colorspace_1.3-2 labeling_0.3
#> [52] stringi_1.1.7 lazyeval_0.2.1 munsell_0.4.3
#> [55] crayon_1.3.4