Filtering Vcf Variants Based On Sequencing Coverage
1
1
Entering edit mode
11.8 years ago
Juliofdiaz ▴ 140

Hello: I have produced a set of variants using the following pipeline

## SORT BAM FILE FROM REF MAPPING ##
samtools sort r.bam" r_sorted"
## CREATE LIST OF POTENTIAL SNP OR INDEL ##
samtools mpileup -uf ref.fa r_sorted.bam > r.bcf
## PARSE POTENTIAL SNP OR INDEL USING BAYESIAN INFERENCE ##
bcftools view -bvcg r.bcf > r2.bcf
## BCF FILE IS CONVERTED VIEWABLE FORM ##
bcftools view r2.bcf > r.vcf

I want to do a preliminary filtering of the resulting variants based on sequencing depth. So, I have written a short script that does so by using the DP4 values from the FILTER column.

gi|110645304|ref|NC_002516.2|    314283    .    G    T    222    .    DP=67;VDB=0.0384;AF1=1;AC1=2;**DP4=0,1,31,27**;MQ=58;FQ=-169;PV4=0.47,1,0.37,1    GT:PL:GQ    1/1:255,142,0:99

In this example I would add the first two values of the DP4 (reference coverage) and make sure that they are low enough. I would also ass the last two values of the DP4 (SNP coverage) and make sure they are not undercovered or overcovered. I guess my question is whether this is an ok approach.

For more information, I am working with hiseq sequencing data, it is a single sample and it is bacterial whole genome.

Thanks

vcf samtools snp • 5.8k views
ADD COMMENT
3
Entering edit mode
11.8 years ago

Yes this is the right approach. BTW, there is a tool called vcf tools (http://vcftools.sourceforge.net/) that can be used to process (filtering, comparisons etc) the vcf files. But you can write your own code too (I use my own code). You can read latest NGS papers and use the same parameters that they used in case you don;t have an idea about different parameters. Normally, I discard SNPs that are spanned by over represented reads (3 times of the average coverage). Other than number of reads mapping quality should be used too to filter the SNPs.

ADD COMMENT

Login before adding your answer.

Traffic: 2888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6