Hi everyone,
I want to ask for opinions on dealing with non-uniquely mappable reads in pooled DNA sequencing.
Instead of sequencing individual samples, DNA from multiple samples are mixed and sequenced so as to identify variant and/or estimate allele frequencies. My question is, how do we properly handle non-uniquely mapped reads in this application? If they are discarded, it is almost certainly going to have bias in the estimation of allele frequency because one allele may be uniquely mappable and the other may be not. What would be the potential problem if such reads are assigned to a random alignment as BWA does?
Thanks!
throwing out non-unique reads does not necessarily guarantee you a smaller false positive rate. In fact, since looking at only unique reads generate bias, it may increase false positives. I look at this problem from two angles. If the purpose is to identify variants, then keeping only uniquely mapped reads is probably the right thing to do. But if the purpose is to estimate allele frequency, perhaps non-uniquely mapped reads are helpful to reduce bias.