Question

lots of transcript start at 0 and end at 1 in cufflinks output gtf file

0

Entering edit mode

7.8 years ago

wenhui • 0

Hi, everyone.

I used Tophat2 + cufflinks +cuffmerge to process my RNA-seq data. The annotation file I used is UCSC Hg38. I saw some strange outcomes form the merged gtf file. There are lots of transcript starting at 0 and end at 1. It really confuses me. Should I keep these data or just trim them?

enter image description here

Any advice is appreciated.

Thank you.

RNA-Seq Assembly • 1.8k views

ADD COMMENT • link updated 7.8 years ago by Charles Plessy ★ 2.9k • written 7.8 years ago by wenhui • 0

score 3 · Answer 1 · 2017-06-14

3

Entering edit mode

7.8 years ago

Charles Plessy ★ 2.9k

Long time ago, I tried bowtie/tophat/cufflinks with GENCODE as reference transcript annotation, and also could see a lot of very short transcripts that were created from annotations of codons of interest (start/stop/selenocystein) or nucleotides of interest (polyadenylation sites, ...) in GENCODE. Perhaps there is a similar explanation in your case?

On my side, I stopped using cufflinks. While I have not tried them, perhaps the next generation of tools, namely HiSAT, StringTie or Ballgown may give you better results?

ADD COMMENT • link 7.8 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

Thank you for your reply. I have tried hisat2 +stringtie but the software I used for downstream analysis only support gtf file output from cufflinks. It may have something to do with some prefix in cufflinks' gtf file.

Usually, I prefer hisat2 + stringtie in my work and I found it is much faster than Tophat2 + Cufflinks.

Thank you anyway.

ADD REPLY • link 7.8 years ago by wenhui • 0