Maximum Read Depth
3
1
Entering edit mode
12.9 years ago
Anjali ▴ 60

Hi,

I wish to get the variants of my sequencing data using Samtools (vcfutils). For this I need to specify the maximum read depth in VarFilter function. I have no information of this from my Data. Can anybody tell me, how should I approach to its calculations, or considering default values to obtain SNPs in vcf format??

Thanking you

vcf read samtools • 5.6k views
ADD COMMENT
1
Entering edit mode
12.9 years ago
Doctoroots ▴ 800

generally, maximal coverage for variant calling is set out of memory considerations, for this purpose i think the practice recommended in the GATK Unified Genotyper should be appropriate:

"...When running on projects with many samples at low coverage (e.g. 1000 Genomes with 4x coverage per sample) we usually lower this [the maximal coverage] value to about 10 times the average coverage (40x)..."

good luck

ADD COMMENT
0
Entering edit mode

Is the average coverage the average of the depths of all the variants in a VCF file? Or the average coverage of every mapped base in the exome/genome?

ADD REPLY
1
Entering edit mode
12.9 years ago
Ian 6.1k

I have previously considered this. The '-D' flag allows the program to avoid regions that have abnormally high coverage, e.g. PCR amplification errors. I calculated the mean/median coverage for the sample using BEDTools, which was also covered a previous Biostars question.

genomeCoverageBed -d -ibam file_sorted.bam -g genome_seq.fasta > file_sorted.1bp_coverage_inc0

This method does includes 0 counts for regions not covered by reads.

You can put the output in R or i used the Perl script at the bottom of the above link by Heikki.

To set '-D' i used a value 3 to 5 x that of the mean coverage, you'll to play with this to get the best result for your samples. I saw very little difference in the number of SNPs using any multiplier >3.

Bear in mind SNPs will not be reported for those regions with coverage > '-D'. Setting the value too high will result in too many false positives.

Also it is hightly recommended to locally realign reads to avoid false positive SNPs due to the presence of INDELs, e.g. SRMA or use of the GATK suite.

ADD COMMENT
0
Entering edit mode
12.9 years ago
Pascal ★ 1.5k

Why don't you just use it without setting this parameter. There is a default parameter value set to 1000.

ADD COMMENT

Login before adding your answer.

Traffic: 1625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6