Interpretation of MSMC2/PSMC/betaPSMC/ngsPSMC
1
0
Entering edit mode
6 months ago
vap768 • 0

Hello, I am new to demographic history inference using software like https://github.com/lh3/psmc and https://github.com/stschiff/msmc2. Since I am not very familiar with the algorithms that underlie these programs and have mostly a conceptual understanding of how they work, I have tried a variety of different programs on my data (7 high coverage (35x+) genomes from a non-model species and 190+ low coverage genomes). Since this is a non-model species - genetic maps etc. are not available, and I am unsure that imputing from our 7 high coverage genomes makes any sense since we just have 1 or 2 individuals sequenced at high coverage per population.

For imputing demographic history i have tried MSMC2, PSMC, beta-PSMC (https://github.com/ChenHuaLab/Beta-PSMC) (for more recent history), and ngsPSMC(https://github.com/ANGSD/ngsPSMC), for the low coverage genomes (5x). I was hoping to convince myself of true demographic history with similar results from each of these programs. However, I have quite dramatically different results for different programs (and also using different -p flags to set intervals). For each program I have tried to set similar time segmentation patterns and check for overfitting following some advice here for psmc (https://github.com/lh3/psmc/issues/45), and following this guidance the models don't seem overfit. My question is, how should I interpret differences such as initial rising Ne with my red populations, for instance (this using -p 16+581 and 25 iterations) vs declines in these populations with other paramateers (ie -N25 -p -r5 "262+4+7+1 or betaPSMC parameters -p 16+58*1 -i 25). See attachments for examples. Are these artifacts or potentially real changes that occur because I have increased my resolution. How can I check this? Is the big difference in Ne values (ie sometimes 10x greater) between MSMC and PSMC expected?

other notes: currently using unphased genomes (one at a time in MSMC2), with mask files created as documented in MSMC instructions, but also planning to try phasing genomes (with whatshap since we have no map files for shape_it) in the two-sample mode to look at divergence times between two populations of interest.

Thanks! and please forgive my ignorance and confusion.

enter image description here enter image description here enter image description here enter image description here enter image description here

Demography PSMC MSMC • 723 views
ADD COMMENT
0
Entering edit mode
12 weeks ago
csarabia • 0

Hi,

Your question is very intriguing (no ignorance or confusion to pardon!). You are pointing out exactly the reason why PSMC is normally not trustable for time intervals below 20-30kyr ago. However, ngsPSMC seems to work well for more recent times (we tested it here https://onlinelibrary.wiley.com/doi/full/10.1111/mec.15784).

How to get rid of those later extreme bumps in Ne in PSMC? Check out Hilgers et al. 2025 (https://www.sciencedirect.com/science/article/pii/S0960982224012399) who seem to have answered your question.

What they suggest is using, as you have done in your first plot, a smaller segment size in the beginning of the graph. Instead of the default recommended, "4+25x2+4+6", they use a “2+2+25x2+4+6” or a “1+1+1+1+25x2+4+6” and have much better results. Perhaps, in your case, I would test a modification of your third plot, "2+2+60x1". Let us know how it went!

ADD COMMENT

Login before adding your answer.

Traffic: 2278 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6