Question

Deseq Analysis With Two Samples Without Replicates, Most Padj Equal To 1 And Na

0

Entering edit mode

11.5 years ago

xiaojuhu13 ▴ 150

I only get two samples without replicates for the DEseq analysis,but the results look unnormal,most FDR equal to 1.

> counts = read.table(file="48_50_1", header=T, row.names=1)
> my.design<-data.frame(row.names=colnames(counts),condition=c("L","H"))
> conds <- factor(my.design$condition)
> cds <- newCountDataSet( counts, conds )
> cds <- estimateSizeFactors( cds )
> sizeFactors( cds )
      low      high 
0.9225312 1.0839742 
> cds<-estimateDispersions(cds, method='blind',sharingMode='fit-only')
> cds<-nbinomTest(cds,"L","H")
> head(cds)
     id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
1   23B        0         0         0        NaN            NaN   NA   NA
2 5HT2A        0         0         0        NaN            NaN   NA   NA
3  A1BG        0         0         0        NaN            NaN   NA   NA
4  A1CF        0         0         0        NaN            NaN   NA   NA
5   A2M        0         0         0        NaN            NaN   NA   NA
6 A2ML1        0         0         0        NaN            NaN   NA   NA

after trimming the 0 value, there are just 6 gene id padj are not equal to 1, the total nuber is 332 gene id.

deseq • 11k views

ADD COMMENT • link updated 2.7 years ago by rutuja.digraskar • 0 • written 11.5 years ago by xiaojuhu13 ▴ 150

1

Entering edit mode

As with your Edger Results Without Replicates, Fdr Looks Unnormal, why do you find this unusual. Without replicates, you have almost no power to detect anything.

ADD REPLY • link 11.5 years ago by Devon Ryan 105k

0

Entering edit mode

yeah, after trimming pval=NA, only 332 were left.The total are more than 20,000 genes.

ADD REPLY • link 11.5 years ago by xiaojuhu13 ▴ 150

0

Entering edit mode

That alone seems a bit odd, I've never had a library only cover that few genes. You might look at the alignments to see if they're wonky.

ADD REPLY • link 11.5 years ago by Devon Ryan 105k

0

Entering edit mode

The NA's you are showing you'll also see that your fold change values are NaN (Not a Number) and you're base means are 0. NaN values are when the software runs into either overflow or underflow errors because it is dealing with floating point numbers or doubles that are too large or too small for it to deal with. I forget exactly how many digits this corresponds to but it is a lot. In your case the suspicion would be severe underflow. Given the base means of zero I would assume those are all genes in which you simply have no read coverage.

I suspect something wonky is going on with your dataset as suggested. Also, of course there will be a power issue because of lack of replicates so you may not want to invest too much into the p-values, you'll just have lots of potential false positives in your dataset.

ADD REPLY • link 11.5 years ago by DG 7.3k

score 1 · Answer 1 · 2013-11-26

1

Entering edit mode

11.5 years ago

swbarnes2 15k

If you have no replicates, is it even worth using fancy software like DESeq? Wouldn't you just be looking at ratios? You can do that yourself in Excel.

ADD COMMENT • link 11.5 years ago by swbarnes2 15k

score 0 · Answer 2 · 2022-09-30

0

Entering edit mode

2.7 years ago

rutuja.digraskar • 0

NOIseq gave me good results with foldchange and expression difference for no replicates.. I used the following tutorial: https://jiankaiwang.gitbooks.io/bioinfo-and-combio/content/ngs/noiseq_differential_expression_in_rna-seq.html

ADD COMMENT • link 2.7 years ago by rutuja.digraskar • 0