BWASW and _alt alleles
1
0
Entering edit mode
6.0 years ago

BWA-SW does not find all matches for me, it seems to pick only one random best match. I have no idea how to modify the arguments to have BWA find the secondary matches.

Example:

temp.fa:

>hg38-chr6-test
AACACAGGCCGGACAGAAGCTTGGAAGGTCCTGTCTCCCCAGGGAGGAGGCCCCTGGGACAGTGTGGCTCGTGTCCTTCC
CAACGGCTCCCTCTTCCTTCCGGCTGTCGGGATCCAGGATGAGGGGATTTTCCGGTGCCAGGCAATGAACAGGAATGGAA
AGGAGACCAAGTCCAACTACCGAGTCCGTGTCTACC

This input sequence has five identical matches according to BLAT:

YourSeq   196     1   196   196   100.0%  chr6_ssto_hap7  -     3499060   3499255    196
YourSeq   196     1   196   196   100.0%  chr6_qbl_hap6   -     3412352   3412547    196
YourSeq   196     1   196   196   100.0%  chr6_mann_hap4  -     3494156   3494351    196
YourSeq   196     1   196   196   100.0%  chr6_cox_hap2   -     3622013   3622208    196
YourSeq   196     1   196   196   100.0%  chr6            -    32151332  32151527    196

but just a single match with bwa-sw when I run this command:

bwa bwasw hg38.fa temp.fa

Output:

hg38_dna        16      chr6_GL000253v2_alt     3488536 0       196M    *       0       0       GGTAGACACGGACTCGGTAGTTGGACTTGGTCTCCTTTCCATTCCTGTTCATTGCCTGGCACCGGAAAATCCCCTCATCCTGGATCCCGACAGCCGGAAGGAAGAGGGAGCCGTTGGGAAGGACACGAGCCACACTGTCCCAGGGGCCTCCTCCCTGGGGAGACAGGACCTTCCAAGCTTCTGTCCGGCCTGTGTT    *       AS:i:196        XS:i:196        XF:i:1  XE:i:0  NM:i:0

BWA versions tested: 0.7.5a-r405, 0.7.17-r1194-dirty

I tried with the "-c 0.0001" argument, makes no difference.

bwa blat alignment • 2.1k views
ADD COMMENT
0
Entering edit mode

It has a MAPQ of 0, indicating a multimapper, which fits the BLAT result. This is normal behaviour that multimappers get a MAPQ of 0 and a random (out of all found matches) location being send to output. Is there a specific reason why you need these multimappers?

ADD REPLY
0
Entering edit mode

Oh wow, yes, it's indeed there, just not documented under bwasw but bwa backtrack: "Repetitive hits will be randomly chosen." manytThanks!!

Well, yeah, I need to know the best match for the input sequence and the _alt match is definitely not a very useful match. I thought that BWA knew better these days how to deal with the _alt chroms.

Non-unique matches are very common as soon as you run into chrom regions that have an _alt... I am surprised that this has not come up before somewhere else. I wonder what the normal way to deal with this is, for bwa mem...

ADD REPLY
0
Entering edit mode

I think you should use bwa mem: 1) I am not sure bwa sw is alt-aware, but I am sure bwa mem is, and 2) bwa mem is the recommended algorithm for reads longer than 70bp. Also check this post:

Bwa: How To Get Multiple Hits Using The Algorithm Bwasw?

ADD REPLY
0
Entering edit mode

In this case, my input sequence can be <= 3000 bp.... but you're right, maybe I should use BWA-mem for this? I used bwasw because the input is pretty long, by BWAMEM standards.

ADD REPLY
0
Entering edit mode

bwa mem can align long sequences, but minimap2 (also developed by Heng Li) is preferred for such sequences, as it is faster and more sensitive.

edit: the post Minimap2 and the future of BWA is an interesting read

ADD REPLY
1
Entering edit mode
6.0 years ago

So the answer seems to be (see comments above):

bwa-sw does not support the XA tag. As such, you cannot get the alternative locations for multi-mapping sequences. This makes bwasw quite useless for most real-world applications that I can imagine, tasks that do not involve mapping reads to a genome.

It does have an option to get alternative locations (-z) but that option seems to be broken, see Bwa: How To Get Multiple Hits Using The Algorithm Bwasw?

I was using it to get the best location of a sequence that the user inputs on a website. No, BLAT is not the best tool here, BLAT is relatively slow to start and no, BLAT's gfServer is not an option either, because that requires a lot of RAM. BWA is quite ideal, because I have the indexes of that already on disk. minimap2 is not an option for me, I try to stay away from very recent algorithms and also I have the bwa indexes already and don't want to re-index all my genomes. I'll try to move to BWA-MEM.

Thanks everyone for the comments! I learned a lot!

ADD COMMENT

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6