Question

Allele frequency calculation

0

Entering edit mode

8.7 years ago

nkausthu ▴ 40

I have ~200 exomes which includes related and unrelated individuals. I have done the joint genotype calling and calculated the allele frequency using VCFtools. But as its a mixed population what is the ideal way to calculate allele frequency?. I would like to make an in-house variant database from our available exomes and the corresponding allele frequencies will be used to filter the variants. It would be helpful if you can give some further information about the methods to adjust relatedness. Thank you ..

Allele frequency Joint genotype calling • 4.2k views

ADD COMMENT • link 8.7 years ago by nkausthu ▴ 40

0

Entering edit mode

You could use GATK SelectVariants to subset the VCF file accordingly and then calculate the AF.

ADD REPLY • link 8.7 years ago by Dave Tang ▴ 210

0

Entering edit mode

Actually I would like to know if I include all these related individuals along with unrelated individuals for calculating the allele frequency will it be biased? Or is there any statistical way to avoid this bias?

ADD REPLY • link 8.7 years ago by nkausthu ▴ 40

0

Entering edit mode

I would expect that AFs will be more similar amongst related individuals. Depending on what you want to do, there are methods for adjusting for relatedness.

ADD REPLY • link 8.7 years ago by Dave Tang ▴ 210

0

Entering edit mode

I would like to make an in-house variant database from our available exomes and the corresponding allele frequencies will be used to filter the variants. It would be helpful if you can give some further information about the methods to adjust relatedness. Thank you ..

ADD REPLY • link 8.7 years ago by nkausthu ▴ 40

1

Entering edit mode

Please add this information to your initial post and try to be as informative as possible when asking questions. Those details are very important.

For filtering you don't want to inflate the allele frequencies because of related individuals. I think the only correct way of creating such a database would be to count a variant shared by e.g. three sibs as just once. You should count in how many families variants are observed, because those observations are not independent. An easier way (but you will lose information) would be to not include related individuals (essentially just chose one individual per family, randomly).

ADD REPLY • link 8.7 years ago by WouterDeCoster 48k

0

Entering edit mode

You are absolutely right!! Removing redundant variants from related individual is something I though about but again the problem is which zygosity I should keep. eg : same variant in het/het/hom in three related individuals and which variant I will keep and which will I remove? . As you already told if I consider one individual form each family then I will loose so many variants. So I am bit confused ...

ADD REPLY • link 8.7 years ago by nkausthu ▴ 40

0

Entering edit mode

Don't count a variant twice if the two observations are from the same family, it still counts as one.

ADD REPLY • link 8.7 years ago by WouterDeCoster 48k

0

Entering edit mode

just consider the following 3 scenarios

3 related individuals - het/het/het - this will be taken as 1 allele count
3 related individuals - hom/hom/hom - this will be taken as 2 allele counts
3 related individuals - het/hom/het - what will be the allele count in this scenario?

ADD REPLY • link 8.7 years ago by nkausthu ▴ 40

0

Entering edit mode

I would simplify scenario 3 to 2 allele counts. But it's imperfect.

ADD REPLY • link 8.7 years ago by WouterDeCoster 48k

1

Entering edit mode

You can adjust population allele frequency for relatedness using ideas described above. From my experience and 1000 genome project, some people incorrectly report their relatedness and ethnic group. Because of this I strongly recommend you to test for relatedness based on vcf files you have. You can use KING http://people.virginia.edu/~wc9c/KING/manual.html for this. You can use apriori probabilities given the relatedness and correct for it to count each allele frequncy approximately once.

Another thing to consider is storing the number of alternative homozygotes and heterozygotes you saw with no pathology. The reason is for inheritance model and penetrance testing.

ADD REPLY • link 8.7 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Your post does not explain who these people are or what you are trying to accomplish. Or even what kind of data you have. Please clarify it, in great detail.

ADD REPLY • link 8.7 years ago by Brian Bushnell 20k