Entering edit mode
8 months ago
machoo
•
0
Hi, I'm looking for a phased vcf reference file to use for phasing in Beagle v4.1, equivalent to that provided by 1000 Genomes in link below, but for mouse. Does anyone know if a dataset like this exists?
aren't most mouse refs Inbred strains ? so, is it possible to generate a phased vcf ?
No negative implications; but experimental mice are typically either inbred (homozygous everywhere) or some kind of cross, so in general you should already know the haplotypes.
There were two initiatives to create genetically diverse experimental strains: the Collaborative Cross and Diversity Outbred projects, and the genotype data for these can be found here: https://www.jax.org/research-and-faculty/genetic-diversity-initiative/tools-data/diversity-outbred-reference-data
I'm not sure how well (statistical) phasing methods will work out-of-the-box for cross strains; i suspect the prior probability for haplotype switching is tuned for humans as opposed to a few cross generations, and thus will necessarily be too high. If you're looking at known mouse strains, you are probably better off with a custom approach that combines physical phasing and known founder-specific mutations to infer haplotypes.
And if you've just got something like Black6, you should have very very few (true) non-homozygous calls, most should be artifacts.
Thank you LChart and Pierre for your answers, that makes a lot of sense. Beagle is a part of another tool I have been attempting to run, and I did not stop to think about the difficulties in predicting phase from a highly-inbred population (we are using a modified C57BL/6J).