Question

Strand Specific Tophat Output

0

Entering edit mode

11.6 years ago

GPR ▴ 390

Hello, I am having a hard time coming about the possible bad quality of a data-set I just got. The library is paired-end and strand-specific (first strand), and so, I have aligned with TopHat, including the option "--library-type=fr-strand-specific.

My problem is that after the Cufflinks/Cuffdiff protocol, I look at the output files and find that most of the gene/transcript IDs have a bad "status", either "fail" or "no test" I have ran Cufflinks and Cuffdiff with and without the --library-type option and got the same poor results. This is surprising especially because I have about 500 million reads aligned by TopHat. Unless the complexity is horrible?

My command-lines are the following

tophat -g 1 -G genes.gtf --library-type=fr-firstsrand -o output BowtieIndex/genome input.fastq input.fastq

cufflinks -u -v -b genome.fa -g genes.gtf -o output input.bam

cuffmerge -g genes.gtf -s genome.fa cuffmerge.txt

cuffdiff -u -b genome.fa cuffmerge/merged.gtf -o output sample1.bam sample2.bam

I ran cufflinks and cuffdiff with and without "--library-type=fr-firststrand

Any input on what I might be doing wrong would be appreciated. G.

• 3.6k views

ADD COMMENT • link updated 11.1 years ago by Biostar 20 • written 11.6 years ago by GPR ▴ 390

0

Entering edit mode

Have you looked at your FPKM distribution from the gtf files themselves? How about your alignment %? What sort of pre-processing did you do?

ADD REPLY • link 11.6 years ago by chris.mit7 ▴ 60