Hi
I have alignments of BWA mem and are giving me a lot of noise in subsequent analysis. If I use Bwa aln I can filter the files by MAPQ and those that have the Tag XT:A:U and this solve the noise problem, however BWA aln is slow to map all my data. How I can filter the Bwa mem alignments if this flag si not present?
My first filter, for my Bwa mem alignments, was by MAPQ, and this is giving me unique mapped reads (the flag 256 is not present) however my analysis are still noisy, I also have reads with several mismatches that I suspect that are the problem, I can filter them by the flag NM:i, however I am wondering if there is a better way to filter my files and obtain more reliable alignments.
Best wishes
What subsequent analyses are you doing? How do you know the alignment is causing "noise"?
I am calculating the D statistic (introgression) for maize, and with the aln filtering the results are similar to other individuals (different maize race) from the same environment, also if I use a masked reference genome for the alignment the "noise" disappears. Thus is very possible that the repetitive regions of the genome and alignments to that regions are responsible for the "noise", but this only happens in this new sequenced individuals, other previous individuals do not need the filtering and the D stats are normal. Could be also sequencing problems? This new individuals have longer reads 150bp, and the previous ones 100.