Need Help Interpreting The Genotype Fields In A Vcf Formatted Data
1
4
Entering edit mode
11.6 years ago
CrazyB ▴ 280

Need some help to understand VCF files (yes, I've read the info from 1kgenome and have some "basic" understanding of them).

In the genotype result, for example, I have the following SNP identified.

chr1    860461    G    A    98    PASS   GT:CQ:DP     1/1:98:4    ./.:98:5    ./.:98:5    0/0:.:.

The genotype for the 4 individuals are

AA, ?, ?, GG

My questions are

  1. why with read depth (DP) = 4 for individual #1, the genotype is "readable" and considered AA, whereas for individual #2, the read depth is 5, but the genotype cannot be called and hence ./.
  2. why for individual #4, nothing is readable but still there is this predicted GG genotype given.
  3. how did this SNP end up being given a "PASS" by the filter? To me, all 4 individuals have poor read at this position.

Any help? Great many thanks

exome-sequencing vcf • 2.8k views
ADD COMMENT
1
Entering edit mode

Which variant caller produced this VCF? It might be helpful to view the output from bam-readcount for this position for all four of your bam files to understand exactly what reads support which bases and what the quality of those bases are.

ADD REPLY
0
Entering edit mode

Thanks a lot for the feedback. I will follow your lead and ask my co-worker for the info. Have to apologize though for not being familiar with the jargon. This VCF came out of a medical center's genomics core facility and I believe they used the "standard" GATK from Broad for sequencing analysis (and made the call). Is this what you were asking?

ADD REPLY
0
Entering edit mode
11.6 years ago
Gabriel R. ★ 2.9k

not sure but here goes:

  1. My guess is that the base quality is probably pretty bad. Check in the bam files using samtools mpileup
  2. GATK has a prior on seeing the reference. Sometimes it does not produce certain fields for homo. ref sites depending on the version and the # of bulls sacrificed prior to running GATK.
  3. Ask the GATK developers. It's a terrible "software" and genotyper.
ADD COMMENT

Login before adding your answer.

Traffic: 1921 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6