Entering edit mode
12 months ago
tnminh89
▴
10
I am trying to extract unaligned reads from bam files and then realign them back to hg38. Here are the commands for these steps:
samtools view -@ 16 -b -F 2 $bamfile | samtools sort -@ 16 -n - | samtools fastq - -1 unaligned.fq1 -2 unaligned.fq2
bwa aln -k 3 -n 3 -t 16 -i 20 hg38.fa unaligned.fq1 > 1.sai
bwa aln -k 3 -n 3 -t 16 -i 20 hg38.fa unaligned.fq2 > 2.sai
bwa sampe hg38.fa 1.sai 2.sai unaligned.fq1 unaligned.fq2 | samtools view -b -@ 16 - | samtools sort -@ 16 - > realigned.bam
samtools index realigned.bam
In the last step where I tried to align unaligned.fq1
and unaligned.fq2
back to hg38, the output sam file keep having this error:
[W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped
I have tried to update to the latest version of bwa (0.7.17) but it didn't fix the problem. And the samtools I used is samtools/1.16.1.
Is this a problem with bwa or is this a problem with samtools? Or something else that I don't know about?
If you write the output of
bwa sampe
to a sam file, does it have mapped queries with the POS field set to 0?Also you can skip the
samtools index
step by doing:samtools sort -@ 16 --write-index - -o realigned.bam
This has been a known bug of some aligners for a while, but I don't know if bwa still falls into this category. One cause of this was reads which aligned against the end of one chromosome and the start of the next (as aligners often just concatenate all chromosomes together for the initial seed and align stage). Such cases were detected and then fixed up to become unaligned, but poorly so - leaving cigar, mapq, etc.
You could validate this by looking at the actual alignments. You can also ignore it, or use
samtools fixmate -z all
(or picard CleanSam IIRC) to clean up such alignments.