Question

Questionable Cuffdiff Test Statistic/P-Value Output

3

Entering edit mode

10.6 years ago

Jason ▴ 60

I'm analyzing Illumina RNA-Seq data from three FADS1 knockout mice and three wild types. After running cuffdiff on the six samples, I see in genes.read_group_tracking that:

$ egrep "tracking_id|Fads1" genes.read_group_tracking | column -t | less -S
tracking_id  condition  replicate  raw_frags  internal_scaled_frags  external_scaled_frags  FPKM       effective_length  status
Fads1        null       0          1          0.880439               0.880439               0.0142817  -                 OK
Fads1        null       1          1          1.09692                1.09692                0.017661   -                 OK
Fads1        null       2          0          0                      0                      0          -                 OK
Fads1        wt         0          347        331.474                331.474                5.38128    -                 OK
Fads1        wt         1          251        246.101                246.101                3.96235    -                 OK
Fads1        wt         2          156        167.516                167.516                2.69709    -                 OK

yet in genes.fpkm_tracking and gene_exp.diff I see:

$ egrep "tracking|Fads1" genes.fpkm_tracking | cut -f 1,10-20 | column -t | less -S
tracking_id  null_FPKM  null_conf_lo  null_conf_hi  null_status  wt_FPKM  wt_conf_lo  wt_conf_hi  wt_status
Fads1        0.0106476  0             0.0303051     OK           4.01357  2.80322     5.21247     OK

$ egrep "q_value|Fads1" gene_exp.diff | cut -f 1,5-15 | column -t | less -S     
test_id  sample_1  sample_2  status  value_1    value_2  log2(fold_change)  test_stat  p_value  q_value   significant
Fads1    null      wt        OK      0.0106476  4.01357  8.55822            4.1466     0.2495   0.999688  no

Where it seems like the confidence intervals are clearly separated, yet the P-value is much higher than my intuition would expect. Any ideas of things to check?

For completeness the cuffdiff command was: cuffdiff -p 12 -o ./analysis/FADS1/cuffdiff -b /mnt/data1/refs/igenomes/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa -L null,wt /mnt/data1/refs/igenomes/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf <bams,from,null> <bams,from,wt>

• 6.0k views

ADD COMMENT • link updated 10.6 years ago by Charles Warden 8.3k • written 10.6 years ago by Jason ▴ 60

0

Entering edit mode

I noticed a similar problem....did you solve this yet? Thanks!

ADD REPLY • link 10.6 years ago by jie.wu.phd • 0

0

Entering edit mode

We used edgeR and it seemed to detect the gene as DE without any problems. I don't have the p-value off-hand though.

ADD REPLY • link 10.6 years ago by Jason ▴ 60

score 4 · Answer 1 · 2014-04-01

On one hand, I think cuffdiff can often yield conservative estimates. It also can give weird results, but I tend to expect those more from non-ideal experimental designs (such as groups without replicates). However, I agree that what you are seeing looks like is probably not a reasonable result.

Either way, you can see some benchmarks for different tools here (for example, in the patient comparison, cuffdiff couldn't identify any differentially expressed genes):

http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html

I would recommend taking the FPKM values (or raw counts, depending upon the strategy) from cufflinks. You can use sRAP for that analysis, but there are lots of ways to use the cufflinks results without having to use cuffdiff (I prefer DESeq among the popular RNA-Seq tools, but I've heard limma is good as well).

http://bioconductor.org/packages/release/bioc/html/sRAP.html

http://bioconductor.org/packages/release/bioc/html/DESeq.html