Hi!
I am stuck with concluding a variant call I made on two BAM files (two samples).
Say sample1 and sample 2 for a specific region.
The command I used:
samtools mpileup -uf hg19.fa sample1.bam sample2.bam -r Chromosomal_Region | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf
I get a result which looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sampl2
chrxyz 74311283 0 A G 4.61 0 DP=2;VDB=0.0465;AF1=1;AC1=4;DP4=0,0,0,2;MQ=37;FQ=-28.7 GT:PL:DP:GQ 0/1:0,0,0:0:3 1/1:34,6,0:2:4
chrxyz 74311467 0 G A 70.2 0 DP=3;VDB=0.0442;AF1=1;AC1=4;DP4=0,0,3,0;MQ=37;FQ=-32.3 GT:PL:DP:GQ 1/1:67,6,0:2:12 1/1:37,3,0:1:9
please ignore the value in chromosome column. In filter column it gave me 0, I don't know if it can be trusted or trashed? My gut feeling and my limited knowledge in SNP calling suggests me to take the second SNP and follow it up with the variant_effect_predictor from ensembl.
Any help in describing the results mentioned here will be appreciated and also suggestion for further analysis are also welcome (like insilico analysing these variants).
Thank you
Given the sequencing depth (max 2 in a sample), I'd hesitate following up on either of those.
Could you please elaborate the terms here, as to how you knew about the sequencing depths in the regions here. Was it from DP? and which terms indicate the mapping quality and what are critical terms here in the result. Kindly share your knowledge Thank you
Sure (BTW, read ashutoshmits answer, which is quite good!), though these are normally defined in the header portion of the VCF file (maybe that's not printed with procedure, I usually used GATK).
DP
is the depth, with the value in theINFO
column is the sum of the depths for each of the samples (see theDP
part for each of them, where parameters are ":" separated). A fuller description of the VCF fields from samtools is available on the samtools website (scroll down to "Understanding the output: the VCF/BCF format).