Question

How To Interpret Dp Fields For Samples In Vcf Files?

2

Entering edit mode

11.5 years ago

Luca Beltrame ▴ 240

I'm doing some matched comparison of samples, and I'm trying to filter the results by depth. However, I'm not sure on how to use the DP field per sample.

Let's make an example: suppose we have matched Sample A and Sample B, and at a particular locus we have a mutation (SNP).

Case 1

DP for Sample A reports 10
DP for Sample B reports 15

Case 2

DP for Sample A reports 10
DP for Sample B reports none (no DP in genotype)

My problem is how to interpret Case 2 (and similar scenarios, e.g. with Sample A with no DP). Given that DP in samples (at least the ones used by the GATK) are reads that pass the quality control metrics, which scenarios are most likely here?

Nothing can be done, the locus for that specific sample may be wild type or not but filtered read depth is not sufficient to determine that (in R terms, this would mean NA)
The locus is assumed wild type due to lack of supporting information (reads)
A wild type locus does not have DP information

This matters to me because I'm currently filtering matched samples where DP is both present and higher than a threshold, and I was wondering if I wasn't too restrictive.

For reference, these results refer to indels generated with the GATK's UnifiedGenotyper in indel mode.

vcf sequencing • 3.9k views

ADD COMMENT • link updated 9.2 years ago by Biostar 20 • written 11.5 years ago by Luca Beltrame ▴ 240

1

Entering edit mode

You could check the pileup at that particular locus just to make sure that the issue is from lack of reads spanning the particular genomic location in that sample. If that is the case, I presume you cannot make a direct comparison for this SNP between the two samples.

ADD REPLY • link 11.5 years ago by Vivek ★ 2.7k

score 1 · Answer 1 · 2013-05-24

Do you have your samples in one .vcf file or seperate .vcf files? I am going to assume it's all in one .vcf file. In practical terms, scenario 1 and 3 are the same. Whether you have 0 or 2 reads from which nothing can be concluded, the result is the same. If your genotype is shown as ./. , that means it hasn't been called at all.

I have not heard of scenario 2 happen. I've been using GATK only recently but I don't think it happens.

Hope this helps.