DESeq2 output and GO analysis
0
0
Entering edit mode
5.5 years ago
2nelly ▴ 350

Hi all,

I was wondering what do normally people use as input of GO analysis (i.e keggres) for RNAseq after producing the DESeq2 output:

-- the raw output of DESeq with all genes? -- filtered for NAs genes ? or -- filtered for p.adj value?

I noticed that the results are quite different, which of course is normal since the dataset size is different

Thank you in advance

RNA-Seq GO • 3.4k views
ADD COMMENT
0
Entering edit mode

One should use significant genes (by whatever definition fits your scientific question, typically FDR < 0.05).

ADD REPLY
0
Entering edit mode

Yes, normally I filter the output by FDR and log2FC and then I use it for GO annotation.

However, I am wondering if this approach is the correct one or should I filter only for FDR as you mentioned.

I am sceptical because every single time that I followed the first approach and tried to GO RNAseq data, qvalues for paths are extremely high. Biologically, the output makes sense but how can I support the findings with high q values (>0.9)?

By trying the second approach things improved a bit only for p values and q values remained still high.

ADD REPLY
0
Entering edit mode

The list of background genes is also highly important, I would use genes with high enough baseMean - maybe a threshold for which you can start observing significant genes.

ADD REPLY
0
Entering edit mode

That can be true. However filtering for FDR<0.05 gives mostly genes with high baseMean. So, this approach does not help too much after FDR filtering

ADD REPLY
0
Entering edit mode

That's your list. I was talking about the background list of genes, usually it's all the genes of the organism and then you'll get enrichment for the tissue your samples are from.

ADD REPLY
0
Entering edit mode

If I understand correctly, you suggest to consider all genes with high baseMean, even those that are not significant(FDR>0.05 or 0.1).

ADD REPLY
0
Entering edit mode

Only as a background. You compare the genes with FDR<0.05 to the rest of the genes that have high enough expression

ADD REPLY
0
Entering edit mode

That is confusing me...Compare them in what context? How is this gonna help the GO annotation?

ADD REPLY
0
Entering edit mode

You'll have to understand how the test works. Some reading: https://david.ncifcrf.gov/helps/functional_annotation.html

ADD REPLY
0
Entering edit mode

yes, ok it s Fisher.

Let me rephrase-simplify the main question: would you use all genes for GO annotation or only a subset of significant genes.

In other words would you use a full unfiltered list of DE genes or a filtered one. This will definitely affect the calculation of adjusted p value. Imagine if you do Bonferroni correction in two sets of 100 genes and 1000 genes. the corrected p value will be different. Of course FDR is more robust but it is. Filtering is something subjective and can produce different results.

According to DAVID example, the 300 genes is the list of DE genes. Then I assume my main process of filtering DESeq for logFC and FDR is correct.

ADD REPLY
0
Entering edit mode

The FDR is for the number of pathways you test, not genes.

ADD REPLY
0
Entering edit mode

Yes, this is the FDR of for GO. For instance if 30 pathways were found, the adjusted p value is corrected by the 30 different test.

The FDR I mentioned before is about the output of DE genes i.e. from DESeq.

Would you feed any GO analysis software or algorithm with a filtered DE genes file or non-filtered?Because any further calculation for pathways' p and q value will be affected.

ADD REPLY
0
Entering edit mode

I would use all genes with FDR < 0.05 (or 0.1). Yet, this wasn't my point

ADD REPLY
0
Entering edit mode

Yes I understand, but this was my main question. Thank you

ADD REPLY

Login before adding your answer.

Traffic: 2332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6