Hi, I have recently been using BWA MEM to align 150bp paired-end reads from 2 cell lines, one is derived from a human male (XY) lineage and the other a female (XX). In the XY case I have reads aligning to both chromosomes, and in the XX case, only to X.
How does the algorithm know to align reads across X and Y correctly -- especially in the pseudo-autosomal regions (i.e. the tips) of X and Y p/q arms which "look" the same, like in any other autosomal chromosome.
Does BWA MEM just align/ distribute reads evenly, if they could go to either the PAR of X and Y? To my knowledge I don't think there is an option to include the karyotype in the BWA MEM algorithm... but I guess if you know the sex of the sample then you could supply either an XY ref.fa or an X ref.fa to mitigate what I have outlined.
Do other labs or people use this approach??
Thanks.
What is the percent sequence identity between the X-PAR and Y-PAR in hg19/hg38? I think
bwa
is agnostic to whatever you're aligning to, and that removes some bias, but others might have a better answer.It does not know. If reads map to multiple locations equally well (multimappers) they get a MAPQ of 0. If you can be sure that your samples do not contain chrY, you might remove it from the reference and build a new index. Still, I never heard of anyone actually doing that. I personally align against the full genome, including unplaced and random contigs plus the EBV decoy, but excluding alternative (ALT) haplotypes.