How does BWA MEM know where to put X and Y origin reads correctly?
1
0
Entering edit mode
5.6 years ago
s1667153 • 0

Hi, I have recently been using BWA MEM to align 150bp paired-end reads from 2 cell lines, one is derived from a human male (XY) lineage and the other a female (XX). In the XY case I have reads aligning to both chromosomes, and in the XX case, only to X.

How does the algorithm know to align reads across X and Y correctly -- especially in the pseudo-autosomal regions (i.e. the tips) of X and Y p/q arms which "look" the same, like in any other autosomal chromosome.

Does BWA MEM just align/ distribute reads evenly, if they could go to either the PAR of X and Y? To my knowledge I don't think there is an option to include the karyotype in the BWA MEM algorithm... but I guess if you know the sex of the sample then you could supply either an XY ref.fa or an X ref.fa to mitigate what I have outlined.

Do other labs or people use this approach??

Thanks.

BWA MEM alignment genetics genomics • 2.2k views
ADD COMMENT
0
Entering edit mode

What is the percent sequence identity between the X-PAR and Y-PAR in hg19/hg38? I think bwa is agnostic to whatever you're aligning to, and that removes some bias, but others might have a better answer.

ADD REPLY
0
Entering edit mode

It does not know. If reads map to multiple locations equally well (multimappers) they get a MAPQ of 0. If you can be sure that your samples do not contain chrY, you might remove it from the reference and build a new index. Still, I never heard of anyone actually doing that. I personally align against the full genome, including unplaced and random contigs plus the EBV decoy, but excluding alternative (ALT) haplotypes.

ADD REPLY
1
Entering edit mode
5.6 years ago

You should check this in the reference genome you are using, but most likely the PAR on chrY is hardmasked: converted to N nucleotides, and as such those reads will align to the chrX region.

ADD COMMENT
0
Entering edit mode

Does not seem to May not be hard-masked based on: In the XY case I have reads aligning to both chromosomes, and in the XX case, only to X.
Edit: Unless this statement does not cover PAR region. OP will have to confirm.

ADD REPLY
2
Entering edit mode

This indeed seems to depend on which reference genome you use, as also discussed in this blog post from Heng Li. TLDR: you should use one in which the region is hard-masked.

ADD REPLY

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6