Edger: Very Low P-Value And Very High Variance Within The Group Of Replicates. What'S My Problem??
1
2
Entering edit mode
11.5 years ago
valentina ▴ 60

I'm using edgeR in order to perform differential expression analysis from RNA-seq experiment.

I have 6 samples of tumor cell, same tumor and same treatment: 3 patient with good prognosis and 3 patient with bad prognosis. I want to compare the gene expression among the two groups.

I ran the edgeR pakage like follow:

x <- read.delim("my_reads_count.txt", row.names="GENE")
group <- factor(c(1,1,1,2,2,2))
y <- DGEList(counts=x,group=group)
y <- calcNormFactors(y)
y <- estimateCommonDisp(y)
y <- estimateTagwiseDisp(y)    
et <- exactTest(y)

I obtained a very odd results: in some cases I had a very low p-value and FDR but looking at the raw data it is obvious that the difference between the two groups can't be significant. This is an example for my_reads_count.txt:

GENE sample1_1 sample1_2 sample1_3 sample2_1 sample2_2 sample2_3    
ENSG00000198842    0    3    2    2    6666    3
ENSG00000257017    3    3    25    2002    29080    4

And for my_edgeR_resulta.txt:

GENE                                         logFC        logCPM       PValue          FDR
ENSG00000198842              9.863211e+00  5.4879462930 5.368843e-07 1.953612e-04
ENSG00000257017                  9.500927e+00  7.7139869397 8.072384e-10 7.171947e-07

I would like that the variance within the group is considered. Does anyone may help me? Some suggestion?

edger rna-seq differential-expression expression • 5.6k views
ADD COMMENT
0
Entering edit mode

Is your raw data normalized?

ADD REPLY
0
Entering edit mode

The raw data refers to the count of reads mapping within the exons (data obtained running htseq-count). The normalization is performed with calcNormFactors(y). Am I correct?

ADD REPLY
1
Entering edit mode
11.5 years ago

The variance is considered, but your signal is apparently a lot higher than the variance.

These two genes have monster-levels of expression in your "Group 2" -- you're looking at a locus with less than 10 reads in group 1, and thousands to tens-of-thousands of reads in group 2.

Are the library sizes wildly different between samples?

You might consider filtering out genes that do no exhibit minimal expression in at least 3 samples, which should remove your first gene (ENSG00000198842 ), and possibly your second gene.

ADD COMMENT
0
Entering edit mode

Have I to remove these genes before the edgeR analysis? Or after??

ADD REPLY
0
Entering edit mode

You should remove them before you start the DGEList function. I don't think there's a standard way of figuring out the cutoff. However, the edgeR users guide mentions multiple ways to go about removing genes based upon low expression levels.

ADD REPLY

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6