Example Benchmark Figures

Benchmarking Methods for Analysis of RNA-seq Data

The code notebooks provided on this resource allow users to apply multiple normalization, differential gene expression, and enrichment analysis methods to an RNA-seq dataset. Here, we provide example results and visualizations from the analysis of the study GSE159094.

4 control samples (human PBMCs with no drug treatment)
4 treatment samples (human PBMCs treated with 10^-7M dexamethasone)

Differential Gene Expression Methods for RNA-seq Data

We would expect the target genes of transcription factors bound by dexamethasone to show significant differential expression under dexamethasone perturbation. Because target genes may have increased or decreased expression, gene signatures are ranked by the following values for each method:

Characteristic Direction: Absolute value of the CD-coefficient
Limma-voom: Adjusted p-value
pyDESeq2: Adjusted p-value
Welch's t-test: p-value
Log fold change: Absolute value of the log fold change
Random: Randomly shuffled gene order

The boxplots display the enrichment rankings of NR3C1-related terms from the Enrichr ChEA 2022 library for gene sets from each method (left) as well as gene sets split by both direction and method (right). The limma-voom (Law et al. 2014; Ritchie et al. 2015) and pyDESeq2> (Love et al. 2014; Muzellec et al. 2022) methods are performing the best at retrieving known target genes of NR3C1. Additionally, since NR3C1 is primarily an activator, we can see a directional effect where up-regulated genes under dexamethasone perturbation are higher ranked than combined and down gene sets.

Enrichment rankings of NR3C1 terms for gene sets derived from each method

Enrichment rankings of NR3C1 terms for each directional gene set derived from each method

The Brownian bridge plots below show the average rankings of genes from NR3C1, NR0B1, and NR1I2 target gene sets in dexamethasone perturbation gene expression signatures computed using each method. The x-axis represents the gene rank within the signature normalized to [0,1], while the y-axis provides the difference from the uniform cumulative distribution function (CDF), D(r)-r. For more information on Brownian bridge plot visualizations, please refer to our paper on the characteristic direction method (Clark et al. 2014).

Once again, limma-voom and pyDESeq2 appear to recover the most NR3C1 target genes in the leading edge.

Brownian bridge plot showing average rank of NR3C1 terms for each method

Leading edge of Brownian bridge plot showing average rank of NR3C1 terms for each method

References

Clark, N. R., Hu, K. S., Feldmann, A. S., Kou, Y., Chen, E. Y., Duan, Q., & Ma'ayan, A. (2014). The characteristic direction: a geometrical approach to identify differentially expressed genes. BMC bioinformatics, 15, 79. https://doi.org/10.1186/1471-2105-15-79

Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome biology, 15(2), R29. https://doi.org/10.1186/gb-2014-15-2-r29

Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research, 43(7), e47. https://doi.org/10.1093/nar/gkv007

Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology, 15(12), 550. https://doi.org/10.1186/s13059-014-0550-8

Muzellec, B., Telenczuk, M., Cabeli, V., & Andreux, M. (2022). PyDESeq2: a python package for bulk RNA-seq differential expression analysis. bioRxiv doi:10.1101/2022.12.14.520412.