I have an imputed file output from BEAGLE IMPUTED FILE Rows are SNP's and columns are individuals.
Now I would like to calculate some imputation accuracy estimates like concordance or Rsq and then plot them across MAF(minor allele frequency). How do I calculate them? Are there any tools which could generate such statistics?
I would also like to determine imputation accuracy in our GBS dataset. Here is what I did:
1) randomly selected 1% SNPs: zcat all.vcf.gz | awk '$1~/^#/ || rand()<=0.01' | bgzip -c > eval.vcf.gz
2) exclude the evaluation sites from the original VCF : bcftools isec -C all.vcf.gz eval.vcf.gz -Oz > impute.vcf.gz
3) imputed the missing data using beagleV4 : java -Xmx100g -jar beagleV4.1.jar gt=impute.vcf.gz out=imputed window=100 overlap=30 niterations=10
when I compard the imputed.vcf.gz and eval.vcf.gz using vcf-compare, I got followign output:
SN Number of REF matches: 0
SN Number of ALT matches: 0
SN Number of REF mismatches: 0
SN Number of ALT mismatches: 0
SN Number of samples in GT comparison: 0
Thanks Zev for your reply. I have already done step 1 where I removed bad quality calls, and then masked the genotyped file which then I used in BEAGLE for imputation. Now I have the imputed file and the original one , and from these files I would like to get those accuracy estimates like concordance. The imputed file is the one I had attached in the original post. Any idea with how to get those estimates?
You can use vcf-compare or bcftools stats to get stats which you can plot using plot-vcfstats. Can you please let me know, how you performed the step1 and step2. I don't have reference panel.
I'm trying to do the same thing..just that my data is multi allelic. How do we calculate imputation accuracy for multi allelic data? Ill appreciate any help on that.Thanks!
To estimate the quality of imputation, I think, imputed.vcf should be compared with all.vcf.gz and not with eval.vcf.gz :)