Entering edit mode
18 days ago
giulia.trauzzi
▴
30
Hi everyone,
I am trying to remove duplicated SNPs from my pgen dataset. These duplicated SNPs are the result of splitting multiallelic loci but now I just want to retain only the genotype that has higher maf, the most common. Is there a way to do this with Plink2? Considering that the most common genotype is not always the first instance in the data so I cannot use the --rm-dup first etc... Is there a way I can do this?
Many thanks.
Giulia
Since the MAF values are different, is it possible to filter your result based on MAF first? And then see whether the duplicates are still there or not.
Python script to process the frequency file:
Create the final filtered dataset
I generated this answer using amplicon.ai, a tool I've been building to make writing biofinformatics code easier. Feel free to try it out