How to differentiate Homozygous reference calls from No calls in VCF
1
0
Entering edit mode
7.9 years ago
second_exon ▴ 210

I'm using samtools multisample SNP calling to call SNPs from different samples. The command I'm using is,

samtools mpileup -uf ref.fa *.bam | bcftools view -vcg - > out.vcf

Then, I'm using vcf-to-tab tool to convert vcf to tab file.

The problem is, No calls (missed in sequencing) are called as homozygous reference calls. How to differentiate No calls from homozygous reference calls.

I appreciate any sort of help.

Edit: My vcf file looks like this:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  samp1.bam   samp2.bam 000000F   634700  .   T   C   3.55    .   DP=1;AF1=1;AC1=4;DP4=0,0,0,1;MQ=60;FQ=-27.4 GT:PL:GQ    0/1:31,3,0:5    0/1:0,0,0:3
snps samtools • 4.3k views
ADD COMMENT
0
Entering edit mode

Generally, you'd do that by depth. If the depth for that sample is below your criteria for safely considering something homozygous-ref at a specific position, then consider it a no-call.

ADD REPLY
0
Entering edit mode

Thanks, Brain. That's a very good idea. But, raw read depth is not printed for individual samples, how to force bcftools to print raw read depth?

ADD REPLY
0
Entering edit mode

Ah, that's kind of unfortunate. I wonder if there's an option for that somewhere? I'm not all that familiar with bcftools, but for some reason depth and allele fraction do not seem to be default fields for each sample. Maybe you're supposed to guess from the PL field ("phred-scaled genotype likelihoods"); since they are all zero for the second sample, the implication is that the depth is zero, even though it was called 0/1 (het). Well, that doesn't make much sense!

One alternative is to use a different tool to calculate coverage at that location for that sample. Bedtools probably allows that, and BBMap's pileup.sh does as well. Not a convenient option, though, for sure.

ADD REPLY
1
Entering edit mode
7.9 years ago
d-cameron ★ 2.9k

No calls (missed in sequencing) are called as homozygous reference calls. How to differentiate No calls from homozygous reference calls.

According to the bcftools documentation:

-g, --gvcf INT  output also gVCF blocks of homozygous REF calls. The parameter INT is the minimum per-sample depth required to include a site in the non-variant block.

Looking at your command line, I notice that you haven't specified a minimum per-sample depth for the -g parameter. Try setting that parameter to a sane, non-zero value and you should be able to differentiate between no-call and homozygous REF.

ADD COMMENT

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6