Mpileup -D And -S
3
1
Entering edit mode
12.1 years ago
GPR ▴ 390

I am calling out SNPs from a high coverage (> 20fold) 75bp paired-end, > 400 million reads data set. I have used a standard BWA>MARK/REMOVE DUPLICARES (PICARD)>SAMTools flowthrough. My samtools mpileup command line is: samtools mpileup -uf genome.fa input-1.bam input-2.bam | bcftools view -bvcg -.> output.bcf bcftools view output.bcf | vcfutils.pl varFilter -D100 > varflt.vcf I did however realize that for high coverage,I should use the variable -S. My question is, is -D100 an ideal threshold? Regarding -S, what's an appropriate/standard value? Many thanks.

mpileup vcf • 3.1k views
ADD COMMENT
0
Entering edit mode
12.1 years ago
Ian 6.1k

In regards to vcfutil.pl varfilter -D I also asked a question about this:

How to interpret -D file in samtools variant calling pipeline?

The thing to be clear about is that any position that exceeds the coverage of '-D' will be excluded from the results. When i run variant detection i usually set '-D' to be 3-5x the mean depth of coverage for the genome. This seems to work well, but still excludes areas where there is artificially high coverage.

Not sure what you mean about '-S'? For which, program?

ADD COMMENT
0
Entering edit mode

Thanks, got a clear idea regarding -D. -S is strand bias p-value in bcftools. I want to add this flag in: bcftools view *.bcf | vcfutils.pl varFilter -D1000(or another value -S (which value???) > output.vcf Since -S is a p-value, I assume 0.05-0.1 would be reasonable, but want to make sure. Thanks.

ADD REPLY
0
Entering edit mode
12.1 years ago

As Ian answered I would set -D to be 3-5 times of the mean coverage. I usually keep it 3X. This way you would discard false postitive SNPs resulting in the CNV regions OR highly similiar regions like paralogs. -S is a p-value but I usually dont care a much about it and use a very low cut off .0001. There have been many papers those have used the same p-value for strand bias.

ADD COMMENT
0
Entering edit mode
12.0 years ago

I wouldn't call 20x high coverage. I'd say high coverage is 100x, at least.

ADD COMMENT

Login before adding your answer.

Traffic: 1899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6