I want to filter out reads that are soft clipped (aligned with bwa mem) > 70 bps, and I'm using the --read-mismatch-limit 70 option to achieve this. However, in the output I see SNPS being called from reads with > 80 bps mis-aligned (see linked screenshot):
Is this option doing what I think it is? Is this the correct option to exclude reads with more than N mismatches?
There is a terminology issue here. In NGS analysis parlance soft clipping is not the equivalent to mismatching bases.
The soft clipped region are not aligned for matching or mismatching regions - these are removed altogether as the alignment score is better when the region is not present. So their alignment is not reported. You could have matching bases within a soft clipped region.
Therefore in general, in my opinion, the soft clipped regions of reads should not be used to call SNPs at all.
Hence I think this flag refers to the actual mismatching bases and not the soft clipped ones.
-U --read-mismatch-limit N
Exclude reads with more than N mismatches where each mismatch
has base quality >= mismatch-base-quality-threshold.
default: ~unbounded
OK makes sense - do you have any suggestions for how to exclude soft-clipped reads (other than removing them from the BAM in the first place)?
If you want to exclude soft-clipped reads, then why not remove them from the bam?
Because I need them included in another analysis, and I'd prefer not to have to take up twice as much space by filtering them just for Freebayes.
can you please test if freebayes can read sub-bash file ? something like
just to know if you can put an inline filter in your command
Nice idea, but this doesn't work. I get the error:
Failed to load index for /dev/fd/62. Rebuild samtools index