Hello: I have produced a set of variants using the following pipeline
## SORT BAM FILE FROM REF MAPPING ##
samtools sort r.bam" r_sorted"
## CREATE LIST OF POTENTIAL SNP OR INDEL ##
samtools mpileup -uf ref.fa r_sorted.bam > r.bcf
## PARSE POTENTIAL SNP OR INDEL USING BAYESIAN INFERENCE ##
bcftools view -bvcg r.bcf > r2.bcf
## BCF FILE IS CONVERTED VIEWABLE FORM ##
bcftools view r2.bcf > r.vcf
I want to do a preliminary filtering of the resulting variants based on sequencing depth. So, I have written a short script that does so by using the DP4 values from the FILTER column.
gi|110645304|ref|NC_002516.2| 314283 . G T 222 . DP=67;VDB=0.0384;AF1=1;AC1=2;**DP4=0,1,31,27**;MQ=58;FQ=-169;PV4=0.47,1,0.37,1 GT:PL:GQ 1/1:255,142,0:99
In this example I would add the first two values of the DP4 (reference coverage) and make sure that they are low enough. I would also ass the last two values of the DP4 (SNP coverage) and make sure they are not undercovered or overcovered. I guess my question is whether this is an ok approach.
For more information, I am working with hiseq sequencing data, it is a single sample and it is bacterial whole genome.
Thanks