In identifying unique reads, if tophat alignment is set to allow mismatches, I assume, that a unique read with single perfect alignment may be tagged as having multiple alignments due to a mismatch acceptance. On the other hand, if tophat alignment is set to disallow any mismatches, even the reads which have single unique alignment with one mismatch will get excluded. Is it possible to set tophat parameters so that only if a read has 0 alignments, then to allow 1 mismatch, if this still yields 0 alignments, then allow 2 mismatches, etc. (until x maximum mismatches to accept is reached)? Or, is this best accomplished after the alignment is made, by filtering the output files (e.g., by alignment quality scores) prior to passing to Cufflinks? Either way, how to accomplish this? Thanks.
So the
accepted_hits.bam
will contain single best alignments, and multi-aligners only if they have the same AS? Thank you.Yes. In case more than one alignments have the best AS (alignment scores), Tophat2 will report all of them. But the default setting for
--max-multihits
is 20 which means if there are 30 alignments with all of them having best AS, then Tophat2 will report 20 of them randomly. If there are only 5 alignments with best AS, then all of them will be reported. The alignments with the second best AS (alignment scores) won't be reported until you use--report-secondary-alignments
feature.It's all clear now, thank you, but I have a follow-up question: When the MAPQ score 4 (single best alignment) is assigned, is it taking into consideration both reads in the pair, so that even if each on its own is a multi-aligner, as a pair they may be unique? (and so then each of these reads would get MAPQ 4 even if each on its own is a multi-aligner)