Hello, I am having a hard time coming about the possible bad quality of a data-set I just got. The library is paired-end and strand-specific (first strand), and so, I have aligned with TopHat, including the option "--library-type=fr-strand-specific.
My problem is that after the Cufflinks/Cuffdiff protocol, I look at the output files and find that most of the gene/transcript IDs have a bad "status", either "fail" or "no test" I have ran Cufflinks and Cuffdiff with and without the --library-type option and got the same poor results. This is surprising especially because I have about 500 million reads aligned by TopHat. Unless the complexity is horrible?
My command-lines are the following
tophat -g 1 -G genes.gtf --library-type=fr-firstsrand -o output BowtieIndex/genome input.fastq input.fastq
cufflinks -u -v -b genome.fa -g genes.gtf -o output input.bam
cuffmerge -g genes.gtf -s genome.fa cuffmerge.txt
cuffdiff -u -b genome.fa cuffmerge/merged.gtf -o output sample1.bam sample2.bam
I ran cufflinks and cuffdiff with and without "--library-type=fr-firststrand
Any input on what I might be doing wrong would be appreciated. G.
Have you looked at your FPKM distribution from the gtf files themselves? How about your alignment %? What sort of pre-processing did you do?