Apologies if this is a simplistic question. I originally asked this in the Galaxy Biostars channel but there was no answer, so I thought I'd try my luck here.
I am working on a variant calling workflow for paired DNAseq reads, with BWA-MEM as my aligner. To achieve high confidence when aligning to a polyploid genome I would like to generate an un-gapped alignment.
I was just wondering if there are any maximum values for scoring I should adhere to. What's the highest penalty score I can assign to -O? In the bwa manual, there are values in square brackets behind the different parameters. In the case of -O, it says "-O INT Gap open penalty [11]". Does this mean the maximum penalty score I can assign is 11, or does this represent something else?
Hello Macspider,
Thank you for your response. I guess there is no maximum penalty score, so I would probably be safe setting the penalty value to something ridiculously high, like 100, while keeping the default X-dropoff value (-d), so that the seed extension will stop as soon as a gap is encountered because the difference between the best and the current extension score would be too large. I was worried that setting -O too high would somehow affect other scoring options, which doesn't really make sense now that I think of it. It should only affect seed extension, which is what I want.
Another idea, which might be fruitful for you, is to calculate penalties according to the expected mutation rate between the two species that you're mapping (reads and reference) and then filter out the final SAM file according to your needs with a homebrew script and/or with samtools.