Check if REF allele is minor allele in any variant
2
1
Entering edit mode
10.1 years ago
Ram 44k

From the discussions in previous questions, I understand that REF and ALT need not necessarily correspond to major and minor alleles. REF is from the ref genome and could very well be the minor allele for the variant.

I'd like to find out if a REF allele is a minor allele for any variant in my region of interest. One of the ways I could do this is to find out COUNT(variants) where af > 0.5 in my region of interest.

Would I be correct in assuming this approach will definitely give me the right answer? Is there any underlying assumption I'm missing before I use this as my standard approach?

Any anomalies you might have noted in your experience would help me. Thank you!

minor-allele REF variant • 6.5k views
ADD COMMENT
5
Entering edit mode
10.1 years ago

you're right: REF allele is just a convention used to describe the allele that corresponds to the reference genome. if for whatever reason you want to know in which variants the REF allele is the minor, looking at AF should do. since REF is a convention there's no biological interest in finding that out, other of course than simply describing and characterizing the reference genome.

but you must have in mind that you could find particular cases out there where the AF calculation may not be as straight-forward as you may think. for instance, in case you have a multiallelic variant where the minor allele is one of the alternative alleles and the major allele is another alternative allele, filtering by the frequency of the first alternative allele >0.5 would output the variant although the REF allele wouldn't be the minor allele. you must force that the AF accounts for all alternative alleles, which can be achieved using for instance bcftools view -q 0.5 file.vcf, as the default -q behaviour is to calculate the AF using all the non-reference alleles.

ADD COMMENT
0
Entering edit mode

Thank you, Jorge. Your reply on a different post was one of my references for the REF/ALT definition. We have a local DB created from reformatted/processed 1000genomes. I'll check on how the DBA dealt with multi-allelic variants.

If they haven't dealt with it the right way, I can always use your bcftools command with the raw VCF. Thank you :-)

ADD REPLY
0
Entering edit mode

the cases where filtering by AF>0.5 wouldn't work if AF is calculated only with the first alternative allele are rare, but take them into consideration is more appropriate though. also, have in mind that filtering 1000genomes raw data by AF also deals with indels, which I'm not sure that could help you to achieve your goal.

ADD REPLY
0
Entering edit mode

I ran a bunch of queries on my DB. There were no multi-allelic variants of any kind, and for all variants with only one REF and ALT alleles, I found no case where af was >= 0.5. I guess I can safely assume that all ALT alleles are minor alleles in my sample space.

ADD REPLY
0
Entering edit mode

there are indeed multi-allelic variants on latest 1000genomes release (previously they used to collapse to bi-allelics) as stated in the callset readme file, and plenty of variants with AF > 0.5 too. if you don't find any yourself then it does depend on the way you've built your database, or on the region or the samples you are considering.

ADD REPLY
0
Entering edit mode

It is the region, I am quite certain. We store multi-allelic variants as multiple records, one record per ALT allele in an SQL database.

ADD REPLY
0
Entering edit mode

On the indels, I'm targeting only SNVs anyway.

ADD REPLY
2
Entering edit mode
6.3 years ago
Shicheng Guo ★ 9.6k

In terms of phase III data-set from 1000 Genome project, only 2149549/84802133=2.5% have >50% or higher Alternative allele frequency. It make sense since human genome is derived from several individual human genome and therefore, the reference genome should have high probability to be major allele.

ADD COMMENT

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6