Gene set enrichment analysis with logFC and PValue
2
2
Entering edit mode
6.2 years ago
Vasu ▴ 790

Hi,

I have RNA-Seq data with 100 samples and 30k genes. Samples as columns and genes as rows. Tumor vs Normal.

After filtering step I see around 19k genes were used for differential analysis. With differential analysis cutoff FC > 2 and FDR < 0.05, there are about 1000 differential expressed genes.

I'm going to use foldchange and Pvalue for ranking the genes and input for GSEA

For Gene set enrichment analysis do I need to use only those 1000 differential expressed genes or do I need to use those 19k genes as input?

thanq

RNA-Seq gsea r gene expression genesetenrichment • 16k views
ADD COMMENT
1
Entering edit mode

Hi, I have one more problem when I use this formula: signed fold change * -log10pvalue, as some of the pvalue = 0. So -log10pvalue returns infinite, which can not be processed further in GSEA. So what should I do with those pvalue = 0? Is it suitable if I simply replace the inf to some extremely large number in the rnk file? Thank you in advance!

ADD REPLY
1
Entering edit mode

Yes, for example .Machine$double.xmin in R.

ADD REPLY
7
Entering edit mode
6.2 years ago
h.mon 35k

You have to use the 19k genes, but how you will do so depends on the enrichment method you are using. For GSEA, you have to use the ranked vector of all 19k genes.

ADD COMMENT
1
Entering edit mode

I'm actually using this for GSEA. Do you think ranking genes based on FC and Pvalue like below is right?

x <- read.table("DE_genes.txt",sep = "\t",header = T)
head(x)
x$fcsign <- sign(x$logFC)
x$logP=-log10(x$p_value)
x$metric= x$logP/x$fcsign
y<-x[,c("Gene", "metric")]
head(y)
write.table(y,file="DE_genes.rnk",quote=F,sep="\t",row.names=F)

I will use that DE_genes.rnk as input for GSEA. Could you please tell me something about this. thanq

ADD REPLY
1
Entering edit mode

This seems fine, it is the same metric as used at Gene Set Enrichment Analysis (GSEA) explained.

ADD REPLY
0
Entering edit mode

if i do it only with p value or adjusted p value would it be logical?

ADD REPLY
1
Entering edit mode

Yes it will be logical, but it's just that fold change for few genes might be very less to be called as differentially expressed.

ADD REPLY
0
Entering edit mode

Hi h.mon, In the hyperlink provided, it is written as "signed fold change * -log10pvalue" In the above-mentioned comment, the following is used: x$logP/x$fcsign

Are both of them are similar?

ADD REPLY
0
Entering edit mode

Hi, I tried this function in R: x$fcsign <- sign(x$logFC), and it only returns you 1 or -1. So it would not change the result between "x$logP/x$fcsign" and "signed fold change * -log10pvalue".

ADD REPLY
0
Entering edit mode

I found a similar question here and some of the response could be useful to my own question: Many DESeq2 P values are 0 thus preventing generation of a rank list for GSEA

ADD REPLY
0
Entering edit mode

In DESeq2, stat is very similar to this metric. A simple plot of DESeq2 stat vs your metric: DESeq2 stat vs metric.

X-axis is stat from DESeq2 Bioconductor R package and y-axis is the metric calculated as you described. Some more details can be found here and in DESeq2 vignette.

ADD REPLY
0
Entering edit mode
6.2 years ago

Hi, Basically, if you want to check on which biological state you gene set belongs its ideal to check some of differentially expressed top up-regulated and down-regulated genes. That number depends upon you. I saw people taking top 100 also and other times the whole set of up and down-regulated genes.

ADD COMMENT

Login before adding your answer.

Traffic: 1547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6