Hi,
the problem is that the page http://samtools.sourceforge.net/mpileup.shtml is outdated. Unfortunately there is not available some recent documentation of samtools mpileup command.
Thus if you run
samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
you get into problems. The page refer to bcftools 0.1.19 while the current version of the package is bcftools 1.3.0. In former documentation http://www.htslib.org/doc/samtools-0.1.19.html and current documentation http://www.htslib.org/doc/bcftools.html there are differences. In particular for calling variants we have to use another tool bcftools call. I will summarize these differences
-b
Output in the BCF format. The default is VCF. while in the new version this switch is replaced by
--output-type b
-v
Output variant sites only (force -c
) I guess that option in the new version should read bcftools call -v
-c
Call variants using Bayesian inference. This option automatically invokes option -e
. I don't think in the new documentation there is option for that.
-g
Call per-sample genotypes at variant sites (force -c
) I don't know even what does it mean nor what is its counterpart in the new version of bcftools.
Thus I think in the updated version of variant calling documentation might be command such as this
samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools call -c -v --output-type b | bcftools view -m 2 --output-type b
In my case where I need minor variants at low frequencies I doubt that this aproach would not work anyway since bcftools does expect some ploidy of the genome which does roughly speaking mean that they expect that variants are either in 100% reads or 50% reads or they do not exist. In my case when I can have some clone with somatic variants in 1% of the reads, that would be stripped off. Another problem is that from the row for example
chr20 31022959 . T C 221.999 . DP=4024;VDB=0;SGB=-0.693147;RPB=0.999999;MQB=1;BQB=0.939023;MQ0F=0;AF1=1;AC1=2;DP4=7,0,4010,0;MQ=60;FQ=-281.989;PV4=1,1,1,1
GT:PL 1/1:255,255,0
I am not able to tell what is a frequency and the read depth for alternate allele. Or am I wrong?
Vojtech.
You are right. The version of bcftools I am using now is 1.1. There is a little bit difference between older and newer version. For example: in version 1.1, commands create .bcf -> .bam and .vcf -> .bcf are differ from commands used in version 1.0. I found following command and it worked ok in bcftools 1.1. Thank you.