Entering edit mode
7.2 years ago
Joel Wallenius
▴
210
Hello!
In the output from vcftools '--freq' option, you could get for example:
CHROM POS N_ALLELES N_CHR {ALLELE:FREQ}
1 861276 2 698 A:1 G:0
1 861292 2 698 C:1 G:0
1 861298 2 698 G:1 A:0
1 861315 2 698 G:1 A:0
It looks wonky because of single tab delimiters, but the point here is that the listed alleles are of frequency 1 and 0. What does that mean? Why mention the second allele if its frequency is 0?
Is it a rounding problem?
I have 349 individuals, so I'm thinking an allele frequency can't be lower than 1/349, which is far from the double precision float rounding limit...
Grateful for some lightshed! :-]
Why couldn't it be 0, if they're all homozygous reference?
Sure but look at the positions. Some positions (e.g. 861277-861291) are skipped, and I take that to mean 'no alleles found'?
That really depends on how you did the genotyping.
What exactly does that encompass?
ie, did you use GATK? Samtools + Freebayes? Something else? Joint Calling? Filter the output dataset? - To try and determine why there are gaps in your report. There may have been a quality filter applied in which the missing bases didn't pass quality thresholds, so it's omitted, but without more information, it's hard to say.
I don't know how the analysis that produced my source .vcf file was carried out, but the output in my original post is from vcftools.
I'll ask the .vcf file supplier if they know...