So I am aligning my RNA-seq using TopHat2 and STAR. STAR is a lot faster but the pipeline that my advisor uses is based on Tophat.
We are profiling expression levels of LINE-1 retrotransposon.
I would have to keep track of unmapped, multimapped, and uniquely mapped reads, and she advised that depending on the aligner, how this is done can vary.
But I am curious, if I use tools like RepeatMasker, would it matter which aligner I use?
To briefly describe our current pipeline, after aligning and deduping, we output multi-, unique, and unmapped bam files. And then merge multi/unmapped reads and realign it to a pseudogenome (using bowtie strata function). We get mapped repeat names, normalize reads, and group by element/family/class.
Thank you very much for your time and help!
Thanks for your reply. I may have worded my question incorrectly based on the link you provided. I understand that the accuracy varies between the two but I was more so curious about how using either STAR or TopHat changes the expression analysis process.
I would essentially be getting the same output bam file, no? Baring that accuracy is different. The only thing I can think of is the fact that Tophat can align PE with its singletons (from preprocessing QC). STAR cannot. But the chance of their being any retrotransposon in these singletons (which are <1% in most cases) are small that it may be a better tradeoff for STAR's alignment speed.