Hi all,
I was wondering if someone looked at this issue in detail. From using STAR
for a while, and (_as much as I can remember!_) Tophat
/Tophat2
as well, I remember that RNA-seq mappers differ from genomic mappers in how they output the multimappers. STAR
does the following:
- if the read maps to few distinct regions (number is determined by
--outFilterMultimapNmax
; I think the number is 10 by default), all of the positions are outputted, and the read is considered mapped; - if the read maps to more than N positions, the read is considered unmapped and is reported as such.
However, I'm working with Hisat2
a lot recently. The relevant parameter for multimapper output is -k
, which is similar to Bowtie2
in -k
mode. Only just now realized that even with -k1
, Hisat2
seems to output the reads that are multimappers - however, it outputs only one location. Am I correct in my understanding? Is there no option for the reads to be considered "unmapped" if they map to too many places?
Thank you in advance, as always.
-- Alex
BBMap the aligner offers
ambig=toss
to discard reads that multi-map. Discarded reads can be written to a file ofunmapped
reads.I don't really want to get rid of multimappers completely, but it's curious that RNA-seq mappers behave differently by default. BBMap is not a dedicated RNA-seq aligner per se, so I would not be surprised with either behaviour. I guess
subread
is the only remaining truly popular RNA-seq mapper that would be interesting to evaluate for this.Not so. It is a spice-aware aligner that will go head to head with any NGS aligner out there (only thing it lacks over STAR is it can't project the alignments in transcriptome space while aligning to genome). Main issue is since there is no dedicated publication associated with the aligner so you don't see it used as much.
BBMap actually allows you to be very flexible with multi-mappers. You can choose to do one of the following. It may be the only aligner that does this.
That's what I meant by "dedicated". I know it's a nice and a very fast mapper, but popularity is a strong factor here..
I am not sure what is the difference between "best" and "random", actually. I thought most modern mappers only would count something a multimapper if it maps to several distinct positions with the same mapping quality - which automatically implies there is no best.
Otherwise, that's kind of the point of my post - all of the mappers seem to have some subtle differences.
Hisat2
does not have an option to toss multimappers completely (perhaps I missed something?), andSTAR
has a totally unique approach of outputting multimappers that map to up to N places, and considering all reads that map to >N unmapped.I interpret it as "choose the first site read maps to well" for
best
and then stop looking.Strong factor for what?
but then you would not know if it's really _best_ would you?
for choosing the tool to work/publish with. popular and well supported mappers had hundreds of small bugs and issues fixed, and are easier to just point to a publication if questions arise during peer review