If you are interested only in SNPs:
samtools mpileup --skip-indels -d 250 -m 1 -E --BCF --output-tags DP,DV,DP4,SP -f <reference genome.fa> -o <output.bcf> <list of input bam files>
bcftools index <output.bcf> <indexed.bcf>
bcftools call --skip-variants indels --multiallelic-caller --variants-only -O v <output.bcf> </code><code>-o </code><output.vcf>
--skip-indels
(samtools) and --skip-variants indels
(bcftools) - report only SNPs
-d 250
- maximum read depth to consider per position
-m 1
- minimum number of gapped reads for an INDEL candidate
-E
- recalculate BAQ on the fly, ignore existing BQ tags
--BCF
- output is bcf file
--output-tags DP,DV,DP4,SP
:
DP (number of reads covering position), DV (number of high-quality variant reads), DP4 (number of forward reference, reverse reference, forward non-reference and reverse non-reference alleles used in variant calling) and SP (phred-scaled strand bias P-value) tags in the output file.
--multiallelic-caller
- call only multi-allelic variants
--variants-only
- use if you are interested to report only potential variant sites and exclude monomorphic ones (sites without alternate alleles)
Ask, if you have any questions. Hope, it helps you :)
Which version of bcftools are you using?
version 1.2
would you be so kind as to explain what the cmd line means exactly? thank you
Thank you all for your help - I managed to do the first part (mpileup) but when I did index for the bcf file, I got an indexed bcf file, not a vcf file:
304.bcf.csi
. No vcf file was created...Ahh, yes, thanks. I edited command line!
Thanks! this makes more sense :)
After the last command line - the file containing the SNPs in the <out.bcf>?
Is it the same as the first bcf file?
You are welcome! :)
No, they are not the same. Read more here --> What's the difference between mpileup output and bcftools call ?
thank you! you are a saint! :)