Question

Need Help Interpreting The Genotype Fields In A Vcf Formatted Data

4

Entering edit mode

11.8 years ago

CrazyB ▴ 280

Need some help to understand VCF files (yes, I've read the info from 1kgenome and have some "basic" understanding of them).

In the genotype result, for example, I have the following SNP identified.

chr1    860461    G    A    98    PASS   GT:CQ:DP     1/1:98:4    ./.:98:5    ./.:98:5    0/0:.:.

The genotype for the 4 individuals are

AA, ?, ?, GG

My questions are

why with read depth (DP) = 4 for individual #1, the genotype is "readable" and considered AA, whereas for individual #2, the read depth is 5, but the genotype cannot be called and hence ./.
why for individual #4, nothing is readable but still there is this predicted GG genotype given.
how did this SNP end up being given a "PASS" by the filter? To me, all 4 individuals have poor read at this position.

Any help? Great many thanks

exome-sequencing vcf • 2.9k views

ADD COMMENT • link updated 11.8 years ago by Gabriel R. ★ 2.9k • written 11.8 years ago by CrazyB ▴ 280

1

Entering edit mode

Which variant caller produced this VCF? It might be helpful to view the output from bam-readcount for this position for all four of your bam files to understand exactly what reads support which bases and what the quality of those bases are.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 11.8 years ago by Malachi Griffith 20k

0

Entering edit mode

Thanks a lot for the feedback. I will follow your lead and ask my co-worker for the info. Have to apologize though for not being familiar with the jargon. This VCF came out of a medical center's genomics core facility and I believe they used the "standard" GATK from Broad for sequencing analysis (and made the call). Is this what you were asking?

ADD REPLY • link 11.8 years ago by CrazyB ▴ 280

Ram · Answer 1 · 2013-06-06

not sure but here goes:

My guess is that the base quality is probably pretty bad. Check in the bam files using samtools mpileup
GATK has a prior on seeing the reference. Sometimes it does not produce certain fields for homo. ref sites depending on the version and the # of bulls sacrificed prior to running GATK.
Ask the GATK developers. It's a terrible "software" and genotyper.