Entering edit mode
2.6 years ago
dec986
▴
380
I have been given a task about how to estimate race from VCF.
I have found Calculating ethnicity of a sample VCF which gives the admixture package, but this isn't exactly what I want.
I want to take a VCF and find out what race someone is from this VCF. Admixture doesn't appear to be able to do that.
How to predict sample ethnicty from a VCF using 1000G or gnomAD ? also gives https://github.com/brentp/peddy which is another close package, but doesn't do exactly what I need.
What software packages exist that can estimate ethnicity?
race != ethnicity, you can't tell someones race from a vcf, only their ethnicity. What's wrong with using ADMIXTURE? If you use it in supervised mode, then you will be able to at least infer ancestry proportions.
From the Admixture manual:
"ADMIXTURE is a program for estimating ancestry in a model-based manner from large autosomal SNP genotype datasets, where the individuals are unrelated (for example, the individuals in a case-control association study)."
the individuals in the VCFs that I'm working on are almost certainly related to one another. That's why I didn't consider admixture. Perhaps I was wrong to think that?
Also, the links to the admixture website on UCLA are broken
Right - I suspect that won’t matter much if you are running it in supervised mode and use a reference panel like the 1000 genomes. The exact proportions may be a bit off but I think it will get it basically right.
do you have a wgs vcf? the snps that best predict population are not in protein coding genes. if you have the MT variants you can use Mitomaster to determine their mt haplogroup.
I do not have mitochondrial WGS, unfortunately