Hisat2 multimapped reads
1
0
Entering edit mode
6.4 years ago
corend ▴ 70

I work with paired RNA-seq data from 2 conditions in 1 species.

I aligned those data on the reference genome using hisat2. I am interested on retrieving reads concordantly mapped only once in the genome.

I filtered the SAM output using this post, but here is the problem : They say that ZS field corresponds to multimapped read. Also in hisat2 documentation they say that field NH:i: corresponds to number of locations where the read is mapped.

In my data, I observe pairs with the first one having no ZS field and having NH:i:1. And the second in pair having ZS and NH:i:1. How is it possible ? Then, when I use the filter based on 0x2 and ZS I keep the first read of the pair, when I would like to keep both or not to keep them at all.

Exemple reads :

K00114:593:HHWHKBBXX:1:2212:10094:37941 99      NC_019867.2     30693395        60      2S77M88N21M     =       30693461        120     GTCGACCAGAGGCGTACAGGGACAGCACGGCCTGGATGGCCACATACATGGCAGGAGAGTTGAAGGTTTCAAACATGATCTGTGTCATCTTCTCCCTGTT    A<AFFJJFJFFAJJF7F-FFJFFFJJFJFJFFJJJJFJJJAJFJJJJJFJJJJJJJJAFFJJJFAF<JFJJJJJ7JA-AJFFFJAFJJJJJJJJJJFAAA    AS:i:-2 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:98 YS:i:0  YT:Z:CP XS:A:-  NH:i:1
K00114:593:HHWHKBBXX:1:2212:10094:37941 147     NC_019867.2     30693461        60      11M88N84M       =       30693395        -120    TCAAACATGATCTGTGTCATCTTCTCCCTGTTGGCTTTGGGGTTGAGGGGAGCCTCGGTCAGCAGGACTGGATGCTCTTCAGGTGCGACTCTCAG FFJFJJ<FJFJJJFJAFJAJJJAJJAA-JJFJAA<-<-<JJJFJFJFFJJJAJJJJFJA<<A7FFF<AJJFJFJJJJJJJJJFJJJFAJJFAAAA AS:i:0  ZS:i:-16        XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:95 YS:i:-2 YT:Z:CP XS:A:-  NH:i:1

When I use the first post to parse my SAM am I really getting uniquely concordantly mapped reads ?

RNA-Seq alignment hisat2 • 4.4k views
ADD COMMENT
0
Entering edit mode

You could start filtering the file right after alignment, before you sort by coordinate, and then pipe this into samtools fixmate. Given you removed one, but not the other mate, fixmate will correct the FLAG and mark the remaining read as unpaired. When you then pipe this into samtools view and discard unpaired reads, you'll get rid of filtering-induced singletons. The question is if this is worth the efford, depending on your downstream. What do you want to do with the data? Make a count matrix?

ADD REPLY
0
Entering edit mode

In that case, I get rid of pairs mapping concordantly once with one read of the pair with multiple alignments ? If a pair has only one concordant mapping I would like to keep it even if one of the read maps somewhere else also.

ADD REPLY
2
Entering edit mode
6.4 years ago

Do not filter by the presence of a ZS auxiliary tag, just filter by MAPQ and/or flags. Just because one read of a pair can theoretically map to multiple places doesn't mean that the fragment can map to multiple places. Having one read on the pair with a definite alignment decreases the number of places that its mate can map.

ADD COMMENT
0
Entering edit mode

Just because one read of a pair can theoretically map to multiple places doesn't mean that the fragment can map to multiple places.

Exactly that is what I want to dodge.

If a fragment has two alignments with good MAPQ, filtering by MAPQ will make me keep the pair when I don't want it no?

Edit: I am reading more about MAPQ field. I saw that in bowtie multimapped reads have a MAPQ of 0. But in Hisat2 according to this, there is no real threshold.

In my data I have:

7553803 0
6652596 1
53838185 60

Edit2: I figured it out: I need to filter on 0x2 and MAPQ == 60 to get the same percentage as in Hisat2 alignment report.

ADD REPLY
0
Entering edit mode

If a fragment has two biologically plausible alignments then its MAPQ won't be good to begin with.

ADD REPLY

Login before adding your answer.

Traffic: 1592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6