Why does bcftools call show variants with genotype 0/0 ?
1
0
Entering edit mode
8.1 years ago

Hello,

it might be a stupid question but I can't find an answer on the dedicated manuals or on the discussion pages. I was wondering: why does the bcftools call program-function call variants and then assign to them a genotype 0/0 (which is reference/reference)? How is that a variant it it is identical to the reference?

I repeat, it might be a very stupid question but I need to have a clear answer on this in order to justify some results that I have.

Thank you!

EDIT: One example line

Scf00002        19      .       C       .       66      .       DP=8;MQ0F=0.625;AN=4;DP4=7,0,0,0;MQ=4   GT      0/0     0/0
genotype reference bcftools call variant calling • 2.6k views
ADD COMMENT
0
Entering edit mode

how many samples ? what was your command line ?

ADD REPLY
0
Entering edit mode

2 samples.

EDIT:

samtools mpileup -g -f reference.fa sample1.ali.bam sample2.ali..bam > raw.bcf

bcftools view raw.bcf | vcfutils.pl varFilter -d 5 -D 200 -2 0 -Q 25 -W 30 -w 20 | bcftools view -O b > var.bcf

bcftools call -m --threads 16 -o call.vcf -O b var.bcf
ADD REPLY
5
Entering edit mode
8.1 years ago

you need the -v option in bcftools call to report only variants. see here.

ADD COMMENT
0
Entering edit mode

Thank you. It seems to be the one.

ADD REPLY
0
Entering edit mode

but what is the point of mentioning it in the first place? (I think it is not informative)

ADD REPLY
0
Entering edit mode

I found it informative, after all. I came to know how many potentially variant position I had, which were not because my genotype thresholds were too strict (I am saying this 8 months afterwards). I helps you out adjusting your thresholds according to what you find. In my case, the genome on which I study is allotetraploid so a threshold of say 50% of reads calling a variant might be too high. That would end in the 0/0 genotype but you would see it anyway, and when you plot the coverage of all the variants you see a strong peak at say 20% (random number). If you store only variant positions you might not see it and think there is no such thing.

ADD REPLY
0
Entering edit mode

sorry I may miss understand the answer, by saying 50% of reads are we talking about the percentage of reads that aligned to SNP and supporting the variant? and the other half (the other 50% supporting that there is no variant?)

ADD REPLY
1
Entering edit mode

Yes. The thing is that all the 0/0 positions are potentially variant, with the right thresholds / sets / biological conditions. They give you an estimation of what you're leaving out that could be relevant. If you call genotypes on an allotetraploid genome, for example, where it can be diploid and tetraploid all together, you have to set a lower threshold of %ALT to call a 0/1 genotype because you might be in a tetraploid region where your genotype is actually 0/0/0/1 and you have only 25% ALT reads. Using the default diploid settings and not printing out the 0/0 positions I was ignoring the existance of a strong peak at 20% ALT reads, which is likely indicating a tetraploid region.

ADD REPLY

Login before adding your answer.

Traffic: 3666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6