Entering edit mode
10.1 years ago
biogirl
▴
210
Hi all,
Has anyone used PSMC (inferring historical population sizes) using whole genome sequence data from a haploid organism? All posts on here refer to using it with diploid data. A quick Google search revealed people creating 'fake' diploids by combing two haploid sees, which I don't want to do.
Thanks
Not possible.
lh3 -- Is this true even for haploid sequences derived from organisms that have a diploid phase with recombination? I had planned on going down the "fake diploid" route for this, is there a reason I shouldn't?
Fake diploid works.
Thank you for your reply. Is this due to the theory limiting the software, or does PSMC currently not allow haploid data as an input?
The model is based on diploid genomes.
Ok, cool, thanks! I would've really liked to try PSMC as it seems like a great tool, but will look at other tools too.
I think the responses here are slightly misleading. A better way to think about it is that what PSMC usually does is split a diploid genome to create two fake haploids! The model that it is based on is actually one of haploid individuals, so if you ran it on a pair of haploid genomes, it would actually be in a sense more correct than the typical use -- nothing fake about it. (I'm assuming here that your haploids are something like yeast or HIV where recombination is occurring primarily via crossovers. If they're bacteria, then it really would be incorrect.)
Correct, my haploids are yeast. Thanks for your answer, I did consider this, but it's good to know that someone else thinks the same! I will give it a try.
Psmc uses the distribution of local pairwise heterozygosity to infer history. It completely ignores any phasing/haplotype information. It is not right to say running on a pair of haploid genomes is "more correct". Also, psmc assumes a coalescent-with-recombination process. Not all fake diploid can be modeled by psmc.
I wonder what was your conclusion about this question. I'm now trying to do something similar merging two haploid genomes to produce a diploid sample. I used this to run MSMC/MSMC2. But I wonder if there is a problem with that. or which are the possible consideration/limitations.
Thanks a lot,