Question

Reporting of multimapping reads by STAR and Hisat2

1

Entering edit mode

4 months ago

predeus ★ 2.1k

Hi all,

I was wondering if someone looked at this issue in detail. From using STAR for a while, and (_as much as I can remember!_) Tophat/Tophat2 as well, I remember that RNA-seq mappers differ from genomic mappers in how they output the multimappers. STAR does the following:

if the read maps to few distinct regions (number is determined by --outFilterMultimapNmax; I think the number is 10 by default), all of the positions are outputted, and the read is considered mapped;
if the read maps to more than N positions, the read is considered unmapped and is reported as such.

However, I'm working with Hisat2 a lot recently. The relevant parameter for multimapper output is -k, which is similar to Bowtie2 in -k mode. Only just now realized that even with -k1, Hisat2 seems to output the reads that are multimappers - however, it outputs only one location. Am I correct in my understanding? Is there no option for the reads to be considered "unmapped" if they map to too many places?

Thank you in advance, as always.

-- Alex

STAR Hisat2 multimapper • 935 views

ADD COMMENT • link updated 4 months ago by rfran010 ★ 1.3k • written 4 months ago by predeus ★ 2.1k

0

Entering edit mode

BBMap the aligner offers ambig=toss to discard reads that multi-map. Discarded reads can be written to a file of unmapped reads.

ADD REPLY • link 4 months ago by GenoMax 147k

0

Entering edit mode

I don't really want to get rid of multimappers completely, but it's curious that RNA-seq mappers behave differently by default. BBMap is not a dedicated RNA-seq aligner per se, so I would not be surprised with either behaviour. I guess subread is the only remaining truly popular RNA-seq mapper that would be interesting to evaluate for this.

ADD REPLY • link 4 months ago by predeus ★ 2.1k

0

Entering edit mode

BBMap is not a dedicated RNA-seq aligner per se

Not so. It is a spice-aware aligner that will go head to head with any NGS aligner out there (only thing it lacks over STAR is it can't project the alignments in transcriptome space while aligning to genome). Main issue is since there is no dedicated publication associated with the aligner so you don't see it used as much.

BBMap actually allows you to be very flexible with multi-mappers. You can choose to do one of the following. It may be the only aligner that does this.

ambiguous=best          (ambig) Set behavior on ambiguously-mapped reads (with 
                        multiple top-scoring mapping locations).
                            best    (use the first best site)
                            toss    (consider unmapped)
                            random  (select one top-scoring site randomly)
                            all     (retain all top-scoring sites)

ADD REPLY • link 4 months ago by GenoMax 147k

0

Entering edit mode

That's what I meant by "dedicated". I know it's a nice and a very fast mapper, but popularity is a strong factor here..

I am not sure what is the difference between "best" and "random", actually. I thought most modern mappers only would count something a multimapper if it maps to several distinct positions with the same mapping quality - which automatically implies there is no best.

Otherwise, that's kind of the point of my post - all of the mappers seem to have some subtle differences. Hisat2 does not have an option to toss multimappers completely (perhaps I missed something?), and STAR has a totally unique approach of outputting multimappers that map to up to N places, and considering all reads that map to >N unmapped.

ADD REPLY • link 4 months ago by predeus ★ 2.1k

0

Entering edit mode

I am not sure what is the difference between "best" and "random",

I interpret it as "choose the first site read maps to well" for best and then stop looking.

but popularity is a strong factor here..

Strong factor for what?

ADD REPLY • link 4 months ago by GenoMax 147k

0

Entering edit mode

I interpret it as "choose the first site read maps to well" for best and then stop looking.

but then you would not know if it's really _best_ would you?

Strong factor for what?

for choosing the tool to work/publish with. popular and well supported mappers had hundreds of small bugs and issues fixed, and are easier to just point to a publication if questions arise during peer review

ADD REPLY • link 4 months ago by predeus ★ 2.1k

score 1 · Answer 1 · 2024-07-01

1

Entering edit mode

4 months ago

rfran010 ★ 1.3k

You should be able use the mapq score to filter out multimappers. If I remember correctly bowtie2/hisat2 roughly report mapq as -10 log10 Pr(mapping position is wrong).

So if it finds two equal matches, score should be 3 or lower...

The problem happens when you want to include multimappers, then hisat2 is not good for that since you need to use the -k mode which is slower and I don't think it's optimized for finding many matches.

ADD COMMENT • link 4 months ago by rfran010 ★ 1.3k

1

Entering edit mode

I don't actually want to filter multimappers - you are correct, it's not hard to filter them out, and I think NH is a more robust way of doing it than the mapping quality (although I am not sure why I remember it this way).

My observation was more about the default behaviour of each mapper.

ADD REPLY • link 4 months ago by predeus ★ 2.1k

0

Entering edit mode

I see, then I would say you are correct where hisat2 cannot filter and report "too many mapped" like STAR. The -k option is really just to allow the output of more reads than the "best" one. For example -k1 doesn't really make sense, since that is sort of the default behavior but is maybe worse since terminates after finding 1 match.

ADD REPLY • link 4 months ago by rfran010 ★ 1.3k

0

Entering edit mode

although I wonder if --max-seeds might be used to not report reads with too many alignments... maybe would affect sensitivity though?

ADD REPLY • link 4 months ago by rfran010 ★ 1.3k

0

Entering edit mode

Acutally, I believe Hisat2 also uses the NH tag to report number of matches, so you could filter with that as well.

ADD REPLY • link 4 months ago by rfran010 ★ 1.3k