Hi Biostars,
I want to align reads from a non-model microbat genome to the repeat-masked version of the published microbat (Myotis lucifugus) genome and do variant calling. I have short insert paired-end data generated on an Illumina Nextseq. The Myotis lucifugus genome has fairly gappy scaffolds and is of course a different, albeit closely related species.
Is more appropriate for me to align my reads as single-end or paired-end reads? BWA and other similar aligners penalize unpaired reads heavily by default. My concern is that I will have reads thrown out because their pair either falls within a masked repetitive region or in a gap of N's in the scaffold.
As a side note, when I run BWA with defaults, I get radically different mapping percentages depending on whether I align the reads as single-ended (~60% mapping) or paired-end (~85% mapping). Is this because BWA is penalizing the single-ended reads for not having a mate pair? Would I fix this by reducing the -U
penalty for an unpaired read pair from its default of 17 to 0?
Sorry that this is a few questions bundled together. This is also my first time posting here and I apologize if I've missed a rule.
Thanks. For perspective, only ~70% of my reads are mapping in proper pairs according to samtools flagstat. I was also working with a museum specimen rather than a fresh sample and did a much less stringent size selection to preserve material. I've used a pair-merging program previously which showed that at least ~30% of my reads are also overlapping (so in principle I could also map these as long single-ended reads rather than pairs). I'll definitely try it a couple of ways.
70% isnt too bad you know - particularly when you're mapping to a different species!! I wouldn't bother with my suggestion then, as it will cause headaches down the road. Many programs work with either PE or SE, and cant handle both. I'd keep the 70% and be happy it wasn't 40% ;) Probably thanks to those long reads :)
Well that's good to know! I did try mapping them as single ended and paired and also tried mapping the single end reads with the default
-U
parameter (unpaired penalty) and setting it to zero, it made no difference so I'm guessing BWA is smart enough to know when to use it and when not. Paired did have a slightly higher mapping percent than single, but I think I know why now. By default, bwa mem does mate rescue, where, if one of a pair maps but the other's map quality is too low, it will try to map it again using the SW algorithm. If it passes SW then it will still get mapped. I believe it only does this in paired end mode.Thanks again for your suggestions John, much appreciated.