Need help understanding read depth for somaticsniper vcf's
0
1
Entering edit mode
10.4 years ago
Jordan ★ 1.3k

Hi,

I'm having trouble understanding read depth in vcf's called by somatic-sniper.

Here is a sample vcf file:

##fileformat=VCFv4.1
##fileDate=20140428
##phasing=none
##reference=human_hg19.fa
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=IGT,Number=1,Type=String,Description="Genotype when called independently (only filled if called in joint prior mode)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total read depth">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##FORMAT=<ID=BCOUNT,Number=4,Type=Integer,Description="Occurrence count for each base at this site (A,C,G,T)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##FORMAT=<ID=JGQ,Number=1,Type=Integer,Description="Joint genotype quality (only filled if called in join prior mode)">
##FORMAT=<ID=VAQ,Number=1,Type=Integer,Description="Variant allele quality">
##FORMAT=<ID=BQ,Number=.,Type=Integer,Description="Average base quality">
##FORMAT=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality across all reads">
##FORMAT=<ID=AMQ,Number=.,Type=Integer,Description="Average mapping quality for each allele present in the genotype">
##FORMAT=<ID=SS,Number=1,Type=Integer,Description="Variant status relative to non-adjacent Normal, 0=wildtype,1=germline,2=somatic,3=LOH,4=unknown">
##FORMAT=<ID=SSC,Number=1,Type=Integer,Description="Somatic Score">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Normal    Tumor
chr1    871092    .    G    C    .    .    .    GT:IGT:DP:DP4:BCOUNT:GQ:JGQ:VAQ:BQ:MQ:AMQ:SS:SSC    0/0:0/0:6:5,1,0,0:0,0,6,0:45:.:0:27:38:38:0:.    0/1:0/1:12:8,1,6,0:0,5,9,0:25:.:25:18,23:47:45,46:2:22
chr1    892349    .    G    T    .    .    .    GT:IGT:DP:DP4:BCOUNT:GQ:JGQ:VAQ:BQ:MQ:AMQ:SS:SSC    0/0:0/0:10:9,0,4,0:0,0,9,4:30:.:0:19:36:36:0:.    0/1:0/1:29:24,0,17,0:0,0,24,17:19:.:19:17,10:35:36,33:2:16

The depth of the first variant in tumor is 12. But based on the BCOUNT, it has 15 depth (8+1+6+0). Why this discrepancy? Does anyone know?

And how can I accurately calculate the allele frequency in this case.

For the first variant, the alt allele is C. As per the BCOUNT there is just one 1 C. So, is it 1/15 or 1/12?

It is also a bit concerning that C was called a variant considering that it only has 1 read supporting it, where as A has 6 reads supporting it.

Any help on this is much appreciated. Thanks!

somaticsniper vcf variant-calling depth • 4.7k views
ADD COMMENT
0
Entering edit mode

I'm a bit baffled why this might be happening. I do think it likely this is a bug. The tumor depth should match the DP4 counts. I do see what I think is a bug in how the BCOUNT is calculated. My best guess for DP4 is that there might be some Ns in your reads? Would it be possible to see samtools mpileup or bam-readcount output for these issues?

ADD REPLY
0
Entering edit mode

Yes, my first guess was that the reads would have had Ns in it as well. But shouldn't the DP be higher than BCOUNT in that case? As BCOUNT gives only ACGTs not Ns. Though that does not seem to be the case here.

My reads do have Ns in them btw.

ADD REPLY
0
Entering edit mode

My guess would be 3 reads with Ns for 871092? Is that right?

One reason the BCOUNT seems to make no sense is that you're actually looking at the DP4 counts. It still won't make sense because there are bugs relating to how the counts are generated. Ns are being double counted in all categories and are thus inflating the non-DP counts. I'll try to get a fix out before the end of the week.

ADD REPLY
0
Entering edit mode

I looked at this position. So, here are the results:

Total reads at that position: 12
Total N's: 3
Total G's: 6
Total C's: 2
Total T's: 1
Total A's: 0

So, I think the depth is working right. And like you said, there are 3 Ns. But the BCOUNT is way off.

The BCOUNT should be: (0, 2, 6, 1).

ADD REPLY
1
Entering edit mode

BCOUNT for your example is (0,5,9,0). There are two bugs going on here.

  • BCOUNT is being limited to only bases seen in called tumor and normal genotypes.
  • The limitation in 1 is only being applied in a way that non-ACTG bases can count in multiple categories.

Thus, your T isn't being counted (it isn't in the called genotypes) and your 3 Ns are being added to both your Gs and your Cs.

ADD REPLY
0
Entering edit mode

I see. You said you would fix by the end of the week. Can you let me know when u r done? I will probably re-run the tool then on my samples after the update.

ADD REPLY
1
Entering edit mode

I realize it's been about a year, which is sad. In the event that you are still interested, the latest version (1.0.5.0) should fix the bug(s) you discovered.

ADD REPLY

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6