Entering edit mode
8.2 years ago
reneesummer
•
0
I have a list of SNPs for which I want to prune to a smaller set. I used to prune through PLINK, however, now the genotype I have for each SNP is no longer AA,AT,TT like that kind. But instead, the genotypes can take any numerical value between 0 and 2, like 1.9986 since they are imputed values.
Can someone offer me some advice on how to deal with this situation? Thank you very much!
These are dosage/probability values... The more close to 0, 1 or 2 the more sure the imputation is about the actual genotype. 0 = reference allele usually 1 = heterozygous 2 = alternative allele
Thanks, Floris. I know 0,1,2 are counts of minor alleles in a SNP genotype. But instead of seeing 0,1,2, imputed genotypes generated by some imputation software can take 1.9986 with decimals instead of integers like 0,1 or 2.
First, I am having trouble on how to interpret 1.9986. Second, I wonder if any pruning software can take values like 1.9986 as input for genotypes.
Thanks!
Using PLINK2 -> plink --vcf FILE.vcf.gz --double-id --make-bed --out s1 makes it into a plink file and then you can do pruning
When you want to be a bit more stringent you can add this -> --vcf-min-gp 0.85 this filters out the low probability imputed variants. plink --vcf FILE.vcf.gz --double-id --make-bed --out b1 --vcf-min-gp 0.85
Besides at some point I would also recommend to filter out low LD snps. Typically lower than 0,3 is used.