Question

Cuffdiff missing values for one sample:

0

Entering edit mode

9.3 years ago

jambler24 ▴ 30

I've been doing differential expression analysis between 24 different samples in a pairwise fashion (1 vs 12, 2, vs 13 etc...) and for some reason two of the cuffDiff runs seem to be missing one of the samples reads:

For the genes.count_tracking file:

tracking_id     S11_count       S11_count_variance      S11_count_uncertainty_var       S11_count_dispersion_var        S11_status      S23_count       S23_count_variance      S23_count_uncertainty_var       S23_count_dispersion_var        S23_status
XLOC_000001     33437.6 7.76171e+07     0       7.10768e+07     OK      0       8.85715 0       0       OK
XLOC_000002     9653.58 2.06459e+07     0       2.05202e+07     OK      0       3.79735 0       0       OK
XLOC_000003     0       1       0       0       OK      0.986189        2       0       1.59757 OK
XLOC_000004     78.2659 166367  0       166366  OK      3.94476 7       0       6.39026 OK
XLOC_000005     2.55775 5437    0       5436.89 OK      2.95857 5       0       4.7927  OK
XLOC_000006     0       1       0       0       OK      0       1       0       0       OK
XLOC_000007     5.06907 10776   0       10775.1 OK      0       1       0       0       OK
XLOC_000008     637.689 9475.35 0       9486.91 OK      0       0       0       0       OK
XLOC_000009     124.699 525     0       524.256 OK      0       0       0       0       OK
XLOC_000010     694.462 11220   0       11220   OK      0       0       0       0       OK

For the cds.read_group_tracking file:

tracking_id     condition       replicate       raw_frags       internal_scaled_frags   external_scaled_frags   FPKM    effective_length        status
P1      S11     0       22813.6 23128.8 23128.8 396.39  -       OK
P1      S23     0       0       0       0       0       -       OK
P10     S11     0       0       0       0       0       -       OK
P10     S23     0       0       0       0       0       -       OK
P100    S11     0       2222    2252.69 2252.69 372.465 -       OK
P100    S23     0       1       0.986189        0.986189        -nan    -       OK
P1000   S11     0       2250    2281.08 2281.08 529.99  -       OK
P1000   S23     0       0       0       0       0       -       OK
P1001   S11     0       2208    2238.5  2238.5  106.648 -       OK
P1001   S23     0       1       0.986189        0.986189        -nan    -       OK
P1002   S11     0       19030   19292.9 19292.9 1234.82 -       OK
P1002   S23     0       0       0       0       0       -       OK
P1003   S11     0       4986    5054.88 5054.88 611.858 -       OK
P1003   S23     0       0       0       0       0       -       OK
P1004   S11     0       1308    1326.07 1326.07 179.374 -       OK
P1004   S23     0       0       0       0       0       -       OK
P1005   S11     0       3439    3486.51 3486.51 188.68  -       OK
P1005   S23     0       0       0       0       0       -       OK
P1006   S11     0       1144    1159.8  1159.8  388.503 -       OK
P1006   S23     0       0       0       0       0       -       OK
P1007   S11     0       3555    3604.11 3604.11 145.245 -       OK
P1007   S23     0       1       0.986189        0.986189        -nan    -       OK
P1008   S11     0       5202    5273.86 5273.86 1181.96 -       OK
P1008   S23     0       0       0       0       0       -       OK

etc...

The other runs worked perfectly, its just two that have this problem. Around 95% of the reads aligned to the reference genome for all the samples, and there are no errors being thrown by any of the tools in the pipeline.

Any ideas what else could be causing this?

RNA-Seq cuffdiff • 2.8k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by jambler24 ▴ 30

0

Entering edit mode

Do I understand your question correctly and cuffdiff skips the first sam/bam/cxb file? If you use the sample-sheet option to pass these files to cuffdiff, you may have forgotten to write a header line in the sample-sheet file.

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by nterhoeven ▴ 120

0

Entering edit mode

The command I used is:

cuffdiff \
  -p 8 \
  -o cuffDiff_out \
  -b mycobacterium_tuberculosis_cdc1551_supercontigs.fa \
  -L S11,S23 \
  -u cuffmerge_result/merged.gtf sample_11/accepted_hits.bam sample_23/accepted_hits.bam

And is the same for the other comparisons (e.g. sample 4 vs sample 16) which worked. That's the strange part.

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.3 years ago by jambler24 ▴ 30

0

Entering edit mode

Maybe it helps using the sample-sheet option. This should be less error prone anyway...

You can find an explanation of this here:

http://cole-trapnell-lab.github.io/cufflinks/file_formats/index.html#sample-sheets-for-cuffdiff-and-cuffnorm

ADD REPLY • link 9.3 years ago by nterhoeven ▴ 120

0

Entering edit mode

Have a look at the BAM files in IGV and see if the numbers make any sense.

BTW, I hope you realize that pairwise comparisons between samples aren't going to produce very meaningful results.

ADD REPLY • link 9.3 years ago by Devon Ryan 104k

0

Entering edit mode

Had a look, but nothing was immediately apparent... Still looking into it while I wait for the run using a sample sheet. And yes, don't worry, this was just for the ''make sure it runs" phase :)

ADD REPLY • link 9.3 years ago by jambler24 ▴ 30

0

Entering edit mode

Why so Devon? Even when I run n samples, the output is for all C(n,2) combinations. What is the difference between running each pair individually and letting cuffdiff do the pairing for me?

ADD REPLY • link 9.1 years ago by Ram 44k

0

Entering edit mode

Any 1 vs. 1 comparison will be largely meaningless. You're fitting a model with no degrees of freedom and can't use the results to predict anything aside from differences between the two samples being compared (i.e., you found that two people/animals/whatever are different...so what?). The only time I can think of this being useful is in the clinic where each patient is an N of 1, but even there you'd really want a background population of people to compare them against in order for the results to be very interpretable.

Pairwise comparisons between groups, on the other hand, often makes total sense.

ADD REPLY • link 9.1 years ago by Devon Ryan 104k

0

Entering edit mode

Yes, that's what we are doing - pairwise comparisons among cohorts. Ah I see the "between samples" now.

ADD REPLY • link 2.2 years ago by Ram 44k