Hi,
I have a general query on various allele frequency terms usage while defining rare and common variants (both SNVs and SVs (structural variants).
I have seen papers where people have used population alternate allele frequency (AF) thresholds to define rare and common variants (for example gnomad and gnomad-SV). Many papers also use minor allele frequency (MAF) threshold to define rare and common variants (pop gen papers).
I am aware of the definitions and how they are calculated. However, what I am not understanding the reasons behind if they can be used interchangeably or it is context specific. Any example will be helpful as I am struggling to understand this.
Regards, Prasun
Alternate allele is not always the minor one. The sample(s) composing the human reference contain some rare alleles but those rare alleles are nevertheless and always the REF allele in a VCF. The frequency (AF) of any ALT for this kind of variant will be close to 1.
Thanks, @Pierre Lindenbaun. So, if I have, for example, a VCF file with my variants (sv or snv) and gnomad dataset, what should be the best way to remove common or most frequent variants based on this population dataset. Is just removing my variants which match a gnomad variant with, for example, global AF or any population AF with AF > 0.01 be the correct way?
As you would know, MAF values are absent in gnomad, at least in the gnomad - SV dataset.
My logic is if my detected variant in question is present in many individuals in a population (a common one), it will be benign in nature (in an experiment to find potentially pathogenic variant). Of course, there are other parameters to apply for the same. Is my understanding correct? I would remove the benign variants from my analysis.
Regards, Prasun