Deseq Analysis With Two Samples Without Replicates, Most Padj Equal To 1 And Na
2
0
Entering edit mode
11.0 years ago
xiaojuhu13 ▴ 150

I only get two samples without replicates for the DEseq analysis,but the results look unnormal,most FDR equal to 1.

> counts = read.table(file="48_50_1", header=T, row.names=1)
> my.design<-data.frame(row.names=colnames(counts),condition=c("L","H"))
> conds <- factor(my.design$condition)
> cds <- newCountDataSet( counts, conds )
> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
      low      high 
0.9225312 1.0839742 
> cds<-estimateDispersions(cds, method='blind',sharingMode='fit-only')
> cds<-nbinomTest(cds,"L","H")
> head(cds)
     id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
1   23B        0         0         0        NaN            NaN   NA   NA
2 5HT2A        0         0         0        NaN            NaN   NA   NA
3  A1BG        0         0         0        NaN            NaN   NA   NA
4  A1CF        0         0         0        NaN            NaN   NA   NA
5   A2M        0         0         0        NaN            NaN   NA   NA
6 A2ML1        0         0         0        NaN            NaN   NA   NA

after trimming the 0 value, there are just 6 gene id padj are not equal to 1, the total nuber is 332 gene id.

deseq • 10k views
ADD COMMENT
1
Entering edit mode

As with your Edger Results Without Replicates, Fdr Looks Unnormal, why do you find this unusual. Without replicates, you have almost no power to detect anything.

ADD REPLY
0
Entering edit mode

yeah, after trimming pval=NA, only 332 were left.The total are more than 20,000 genes.

ADD REPLY
0
Entering edit mode

That alone seems a bit odd, I've never had a library only cover that few genes. You might look at the alignments to see if they're wonky.

ADD REPLY
0
Entering edit mode

The NA's you are showing you'll also see that your fold change values are NaN (Not a Number) and you're base means are 0. NaN values are when the software runs into either overflow or underflow errors because it is dealing with floating point numbers or doubles that are too large or too small for it to deal with. I forget exactly how many digits this corresponds to but it is a lot. In your case the suspicion would be severe underflow. Given the base means of zero I would assume those are all genes in which you simply have no read coverage.

I suspect something wonky is going on with your dataset as suggested. Also, of course there will be a power issue because of lack of replicates so you may not want to invest too much into the p-values, you'll just have lots of potential false positives in your dataset.

ADD REPLY
1
Entering edit mode
11.0 years ago

If you have no replicates, is it even worth using fancy software like DESeq? Wouldn't you just be looking at ratios? You can do that yourself in Excel.

ADD COMMENT
0
Entering edit mode
2.2 years ago

NOIseq gave me good results with foldchange and expression difference for no replicates.. I used the following tutorial: https://jiankaiwang.gitbooks.io/bioinfo-and-combio/content/ngs/noiseq_differential_expression_in_rna-seq.html

ADD COMMENT

Login before adding your answer.

Traffic: 1881 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6