Question

No DEGs (adjP 0.99) but GSEA has significant adj p values and enriched pathways

0

Entering edit mode

2.2 years ago

Lalani • 0

Hi.

I am basically a biologist and very new to RNASeq/bioinformatics. We just got our RNASeq results. We had two groups. Cells infected with virus and uninfected control group (n=3 for each group, biological replicates). Our infected vs control are not separating in 2 distinct groups when PCA is performed. There are no DEGs when adjusted P values are considered. P values are significant and there is log2 fold change for some genes. Using the same gene list, I ran GSEA (using galaxy server, standard parameters) and found some enriched hallmark pathways with significant adjusted p value.

I am totally confused if this all is making sense at all? Should i consider my GSEA results reliable?

Again, I am very new and might need a very basic explanation to understand what is happening here.

Thanks.

Lalani

Viruses DEG RNASeq GSEA Galaxy • 2.0k views

ADD COMMENT • link 2.2 years ago by Lalani • 0

score 4 · Accepted Answer · 2022-10-06

4

Entering edit mode

2.2 years ago

jared.andrews07 ★ 18k

While a closer look at your code/DE analysis might be useful, instances such as this are sort of the point of GSEA. GSEA can identify more subtle shifts in gene expression between groups for a biologically-associated group of genes, e.g. a pathway for calcium signaling. While none of these genes individually may be differentially expressed, the group as a whole shows a significant shift between the samples being compared.

I don't know how the Galaxy instance of GSEA is performed. Some implementations of GSEA will just take raw expression counts and calculate the necessary statistics for you, others expect the genes to be pre-ranked by a meaningful statistic, such as log-fold change or the test statistic.

I always find it better to take GSEA results with a grain of salt - it's very easy to end up with a ton of significant results depending on how many genesets you throw at it. And many of the geneset databases (MSigDB, etc) have a ton of redundancy and overlap between genesets. It helps to remove really large (say, >1000 members) or very small ones (say <20 members). Most often, I use this as an orthogonal approach to my DE analysis to highlight specific biological processes that are of interest in whatever we're studying.

So as for whether or not it's actually truthful or reliable in your case, it's hard to say. Do the results make sense biologically? Can you test some of them with other assays?

ADD COMMENT • link 2.2 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Hi.

Thanks a lot for the response. So to answer your questions, this is what galaxy says:

Currently the egsea.cnt function is implemented in this tool. This function takes a raw RNA-Seq count matrix and uses limma-voom with TMM normalization to convert the RNA-seq counts into expression values for EGSEA analysis.

Yes, the results do make sense as in some viral infection we do not expect transcriptomic storm. Rather, it balances the infectious states by adjusting the expression of group of genes to stabilize/combat the virus attack. I plan to pick one of the pathway which is of interest to me and verify the gene expression by qpcr.

So in this case, my only worry is, would it be questioned if i will use gene list which is not differentially expressed (non-significant) for validation by qpcr but has impact on pathway?

Thanks

ADD REPLY • link 2.2 years ago by Lalani • 0

1

Entering edit mode

So in this case, my only worry is, would it be questioned if i will use gene list which is not differentially expressed (non-significant) for validation by qpcr but has impact on pathway?

This really depends on how you frame it. Finding zero DE genes is pretty rare, particularly for an experiment like this. Even just a very low concentration of DMSO will have an impact on the transcriptome in vitro. However, only 3 replicates for each condition may not be sufficient to pick up more subtle shifts in gene expression with low effect size.

I'd consider picking one of the GSEA genesets in your results that you trust the most (or make the most sense biologically), validating a few members thereof via qPCR, and then thinking about if there's any way you can make that response more robust - increasing viral titer, altering the collection timepoint, etc. You could consider doing another RNA-seq experiment with improved conditions (validated by your qPCR genes again prior to seq) if there's funding for it.

ADD REPLY • link 2.2 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thanks a lot. That really makes sense and it was helpful.

Appreciate your input and help.

Regards, Lalani

ADD REPLY • link 2.2 years ago by Lalani • 0