Thank you for your suggestions. I have been using FreeBayes and did not obtain results I aimed for, however it is still #1 pick in terms of detecting complex mutations. However in terms of total control over my output I think I need really something as samtools mpileup and filtering. Only problem is that it does not work well with complex mutations. So far I have this pipeline to find and process low frequency variants starting with input.bam alignments
First step I use since I need to fix MD field in my bam files. That is not mandatory for most other uses since I did custom editing of alignments and masking of primer sequences prior to variant calling. I just changed CIGAR but not MD tag, this will fix that.
samtools calmd -rb file.bam ref.fa | sponge file.bam
Second step is actual mpileup call.
samtools mpileup -d 99999 -t INFO/AD -uf ref.fa file.bam > file.mpi
In the third step I use some filter but no bayesian or other approaches to variant calling. Just manual filtering of variants with coverage > 250 and frequency > 1%.
bcftools filter -i "AD[1]/DP > 0.01" file.mpi | bcftools filter -i "DP>250" | bcftools call -m -A | sed 's/,Version="3"//g' > file.vcf
Last sed is a heck to use next tool. I need to split genotypes for further filtering. Thus I employ utility called vcf_parser to do exactly that.
vcf_parser file.vcf --split | sponge file.vcf
Next I can perform second round of filtering with all alternate alleles. I also can annotate each allele separately.
bcftools filter -i "AD[1]/DP > 0.01" file.vcf | bcftools filter -i "AD[1] > 5" | sponge file.vcf
This is my current pipeline.
I would also like to merge output with Freebayes output. And in case there are complex variants detected by Freebayes to do some kind of merge to final vcf file. That would mean for each complex mutation found by Freebayes I need to find all SNP's and INDEL's in mpileup file that add up to the variant and remove them from former vcf and add in agregate variant from Freebayes but I don't know how to do that.
Vojtech.
Hi, I don't have any matched normal samples and thus I can't use tools such as Strelka. I can only compare to reference genome. Your suggestions about using samtools mpileup or bam-readcount are probably in a good direction. However I would like my output to be vcf file. And I also would like to retain MNPs, large INDELs and complex mutations. I don't know how I would infer these complex mutations from samtools mpileup or bam-readcount outputs? Thanks, Vojtech.
hi, MuTect can be run on single sample as well. Check here. Though it is not that sensitive to report 1 read variants, but you would get VCF as out. For complex mut., I would suggest Lumpy or Delly. Both give VCFs as out and are able to run on single sample input (BAM)