How Is Heterozygosity Determined In A Consensus Fasta?
1
0
Entering edit mode
11.2 years ago
Justin ▴ 470

I have a consensus fasta file generated by samtools/bcftools/vcfutils as discussed in a previous post here: How to generate a consensus fasta sequence from SAM tools pileup?

In some sites, I see e.g. R, which means that site was heterozygous A/G.

How does it determine it's heterozygous? E.g. if you see 50 A's and 39 G's, how do you know it's a het? By using a minor allele % threshold? Or using a probability model? If it uses probability, are there any references out there that describe the math?

consensus samtools • 3.7k views
ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thanks!

For those interested in the math, see the heading "Consensus genotype calling" in the "Methods" section of the paper.

Basically, it's a Bayesian probability model where you want P(genotype g | observed data D), which you can get from Bayes' theorem if you have P(D | g), which the paper explains how to get. Then you estimate g as g* = argmax[g] P(g | D)

ADD REPLY

Login before adding your answer.

Traffic: 2514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6