Question

Accepted Workflows for Strongly Ontology/Pathway-Driven Analyses of RNA-seq Data

0

Entering edit mode

7.9 years ago

JMallory • 0

Are there generally accepted workflows for strongly ontology/pathway-driven analyses of RNA-seq data?

Specifically, I have a dataset of ~159,000 unique transcripts mapping to ENST identifiers that I've performed differential expression analysis on (healthy tissue vs. diseased). The PI I am currently working with is only interested in a specific class of glycoproteins and known pathways related to his disease process. I had a member of his group generate a list of relevant GO terms for both the glycoproteins and known disease pathways. I parsed this to the level of a unique gene list derived from all the GO terms (via biomaRt) and then used this list to filter my DE transcripts.

Now I have a list of transcripts related to the PI's molecules/disease of interest sorted by p-value as give by DE analysis (using edgeR). Simply stated, I have no idea what to do with this.

Intuition tells me I should attempt to integrate log2 fold change data somehow. The PI has suggested to just dump the top 1,000 ontology-filtered DE genes (by p-value) into the Cytoscape ReactomeFI plugin, run gene set analysis, and call it a day. At best, this seems uninformative and, at worst, a tautology since we've already highly preselected the genes to be used as input.

Has anyone else encountered a similar situation? Are there better ways of analyzing RNA-seq data when there are strong prior assumptions about what genes/transcripts/pathways will be considered?

RNA-Seq • 2.0k views

ADD COMMENT • link updated 7.9 years ago by Lluís R. ★ 1.2k • written 7.9 years ago by JMallory • 0

2

Entering edit mode

7.9 years ago

sysbiocoder ▴ 180

Use the differentially expressed genes to determine the biological significant pathways with GeneSCF http://genescf.kandurilab.org/

Check if the pathway of interest is enriched, you cannot just use only selected genes for enrichment analysis

ADD COMMENT • link 7.9 years ago by sysbiocoder ▴ 180

score 1 · Accepted Answer · 2017-09-05

Gene Set Enrichment methods are designed precisely for that purpose. Having your list of genes (transcripts ) of interest you can apply these methods to the whole list of genes to observe if there is any difference of this group of genes, for instance, it is more expressed in the disease rather than in controls.

The most common methods of GSEA are implemented in Bioconductor in the following packages: fgsea (a method similar to the one on the Broad Institute), limma, gsva. For testing which GO terms are more enriched you could use GOseq or topGO or GOstats. Here I am assuming you already know what do these transcripts do.

There are other type of analysis besides differential expression analysis, but using them would require to know what question are you trying to answer.