lots of transcript start at 0 and end at 1 in cufflinks output gtf file
1
0
Entering edit mode
7.4 years ago
wenhui • 0

Hi, everyone.

I used Tophat2 + cufflinks +cuffmerge to process my RNA-seq data. The annotation file I used is UCSC Hg38. I saw some strange outcomes form the merged gtf file. There are lots of transcript starting at 0 and end at 1. It really confuses me. Should I keep these data or just trim them?

enter image description here

Any advice is appreciated.

Thank you.

RNA-Seq Assembly • 1.7k views
ADD COMMENT
3
Entering edit mode
7.4 years ago
Charles Plessy ★ 2.9k

Long time ago, I tried bowtie/tophat/cufflinks with GENCODE as reference transcript annotation, and also could see a lot of very short transcripts that were created from annotations of codons of interest (start/stop/selenocystein) or nucleotides of interest (polyadenylation sites, ...) in GENCODE. Perhaps there is a similar explanation in your case?

On my side, I stopped using cufflinks. While I have not tried them, perhaps the next generation of tools, namely HiSAT, StringTie or Ballgown may give you better results?

ADD COMMENT
0
Entering edit mode

Thank you for your reply. I have tried hisat2 +stringtie but the software I used for downstream analysis only support gtf file output from cufflinks. It may have something to do with some prefix in cufflinks' gtf file.

Usually, I prefer hisat2 + stringtie in my work and I found it is much faster than Tophat2 + Cufflinks.

Thank you anyway.

ADD REPLY

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6