I'm working with a resequenced genome of a non-reference species.
The VCF contains ~7 mln of SNPs, all with their relative position on their own chromosome. I have a 10.01 % of missing data, so I need to impute these NA. I eventually settled for Beagle v5 as a tool, since it can do this job even without a reference panel of phased and completely genotyped individuals.
However, Beagle asks also for a .map file with the genetic distance in cM, which is giving me many troubles. The species lacks a linkage map at the SNP level, so I was thinking of computing it starting from th population recombination rate; however I'd obtain a single value, which is by no mean useful to get the different cM distances.
(Indeed, when I ran Beagle with the output of PLINK 1.9, which has the all the genetic distances set to 0, I got this error:
Exception in thread "main" java.lang.IllegalArgumentException: All loci in genetic map have the same genetic position [0.0]: CHROM_1
My current CL to deal with PLINK, as suggested here, is
plink1.9 --bfile ./PEDwithMorgans_v2/CHROMnumberBhagaVentoux --cm-map ./PEDwithMorgans_v2/Bhaga_@_103chrom_v2.txt --make-bed --recode --out ./PEDwithMorgans_v2/IndividualBhaga_v2_cms --allow-extra-chr
Also, I'm a tad confused by the usage of "genetic distance" here. Usually I assume it's a pairwise measure between different markers, but the map format clearly require a single value.
Can you please point me to some useful tool to perform?
I deeply thank you in advance.