Can Tophat Deal With Overlapping Transcripts When Given --Gtf ?
1
0
Entering edit mode
11.1 years ago
bw. ▴ 260

I'd like to run TopHat2 on a 50bp single-end rna-seq dataset in order to get gene counts for differential expression analysis. I was going to run with --GTF ensembl_genes.gtf since the TopHat2 paper talks about how this leads to significant gains in sensitivity and accuracy. What I'm wondering is - how will overlapping ensembl transcripts effect the results?
When TopHat generates a fasta from my ensembl gene file, it includes multiple overlapping sequences. It seems like these would lead to ambiguous alignment, and that I need to merge overlaps before running TopHat, but I'm not finding any discussion of this on the forum or in the papers, so wanted to double check.

Thanks -Ben

tophat gtf • 2.7k views
ADD COMMENT
0
Entering edit mode
11.1 years ago

TopHat deals with this situation just fine. In fact, at the mapping stage, overlapping transcripts are not really a big problem. However, such overlapping transcripts are harder to deal with at the quantification step (after alignment). This is what cufflinks and other quantification softwares try to deal with.

ADD COMMENT
0
Entering edit mode

That is not entirely true. Tophat may mispredict novel splice junctions by choosing a splicing motif from the wrong strand.

ADD REPLY
0
Entering edit mode

TopHat is certainly designed with the problem of overlapping transcripts as well as antisense transcripts in mind, but you are certainly correct that TopHat can and does produce false positive (and false negative) results.

ADD REPLY

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6