I am using 'bwa' to map Illumina reads to a large database and would like to be able to store (and analyse) the information on multiple hits. I have forced this very brutishly using the following:
bwa samse -n 1000 database.fasta aln_sa.sai reads.fasta > aln_1000.sam
However, the output is no longer in classical .sam format (i.e. no flags, no CIGAR string, ...). Would someone know how to force a real .sam output?
Just as an example, here is the kind of output I get:
>1-815511 3 3
bta-mir-21 +8 0
mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21 +18 0
19 +11033079 0
>2-592417 3 3
bta-mir-21 +8 0
mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21 +18 0
19 +11033079 0
...
What I have tried so far with no luck:
- use the xa2multi.pl script which parses the information in the XA:Z: (information on other hits when their number is inferior to the value of the -n option) to output one line per hit -- this scripts is now part of the 'bwa' package.
Unfortunately, the output of a normal:
bwa samse database.fasta aln_sa.sai reads.fasta > aln.sam
does not contain any XA:Z: field... Even for read 2-592417 which has several hits:
1-815511 0 mRNA___ENSBTAG00000029897|ENSBTAT00000042276|bta-mir-21|bta-mir-21 18 0 23M * 0 0 TAGCTTATCAGACTGATGTTGAC * XT:A:R NM:i:0 X0:i:3 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:23
2-592417 0 19 11033079 0 22M * 0 0 TAGCTTATCAGACTGATGTTGA * XT:A:R NM:i:0 X0:i:3 X1:i:0 XM:i:0 XO:i:0 XG:i:0MD:Z:22
- install the previous bwa-0.5.7 version for which a patch doing exactly what I need has been developed at Broad Institute. But the patch has vanished from its ftp site and is now unavailable.
Any ideas?
I just tried this on a test data generated from the yeast genome and bwa version 0.5.9 seems to fill out all the SAM fields as expected - what version are you running?
I was actually running 0.5.5 (!) but have now installed 0.6.1 and will run it to see if the problem is solved. Thanks a lot!
Perfect! In the latest bwa version, the output is in a fully populated SAM file (XA:Z: field is now present). I just then run the xa2multi.pl on it to obtain one hit per line.
Could you please add this as an answer. Thanks.