I am very confused about results of cufflinks-cuffdiff and only cuffdiff. When I use the first approach (cufflinks-cuffdiff) to detect diferrencial expresion in a data set of Drosophila, the diferential expresed genes are differents that when I use only cuffdiff.
even some genes are not TESTED using one approach or the other. This is very worrying since biological interpretation will depend of the us approach
Example (cufflinks-cuffdiff)
XLOC_008606 CecA1 NT_033777.3:30210873-30211273 sin_spiro con_spiro OK 3.65287 51.0277 3.80418 2.83677 5.00E-05 0.0155562 yes
XLOC_008607 CecA2 NT_033777.3:30212155-30212568 sin_spiro con_spiro OK 2.47732 32.7192 3.72329 2.62197 0.0001 0.0284457 yes
XLOC_009396 Cpr47Eg NT_033778.4:11277946-11278762 sin_spiro con_spiro OK 14.2928 113.849 2.99375 2.89525 5.00E-05 0.0155562 yes
(cuffdiff)
gene17163 gene17163 CecA1 NT_033777.3:30210873-30211273 sin_spiro con_spiro NOTEST 0 0 0 0 1 1 no
gene17165 gene17165 CecA2 NT_033777.3:30212155-30212568 sin_spiro con_spiro NOTEST 0 0 0 0 1 1 no
gene7138 gene7138 Cpr47Eg NT_033778.4:11277956-11278445 sin_spiro con_spiro NOTEST 0 0 0 0 1 1 no
But, for example in the case of CecA1 they even have te same positions using cufflinks-cuffdiff that only cuffdiff ( NT_033777.3:30210873-30211273), I would expect for CecA1 at least similar numbers of mapped sequences.
Thank you
We have to keep the coordinates for genes separate from the coordinates of transcripts. The only statement that we can infer is that the transcript comes from within the gene coordinates. But the gene coordinates tell us very little about the transcript coordinates.
If most reads map to a region that is not annotated as an exon in the original GFF then those won't be counted (at least shouldn't be counted) at gene level with cuffdiff. But once you do annotate them with cufflinks then, the same cuffdiff process will count them differently.
Use IGV and line up the transcript coordinates for the Cufflinks and the original GFF files and there is a good chance the former has regions that are missing from the latter that would explain the difference.
Thank you for your response, so in your experience Do you think that is better use cufflinks- cuffdiff than only cuffdiff?. In the beginning I decided use cuffdiff since I am working with Drosophila and it seem to be a well-annotated organism.
The choice needs to be made based on the phenomena under study - for example, once you establish that there is indeed a novel transcript variant then it makes sense to use that.
Though clearly when doing so there will be a higher burden of proof as one now needs to both establish that the new transcript does indeed exists and then that it expresses differentially.