I extracted some information of a VCF file to output summary statistics. I have 96 individuals that were sequenced from a RADseq protocol. I ran RTG's vcfstats
. Here is my output (modified from the output to see more information at once).
Each line is an individual and the columns are in order:
[1] "Deletion Het/Hom ratio" "Deletions" "Indel Het/Hom ratio"
[4] "Indel/SNP+MNP ratio" "Indels" "Insertion Het/Hom ratio"
[7] "Insertion/Deletion ratio" "Insertions" "Missing Genotype"
[10] "MNP Het/Hom ratio" "MNPs" "Same as reference"
[13] "Sample Name" "SNP Het/Hom ratio" "SNP Transitions/Transversions"
[16] "SNPs" "Total Het/Hom ratio" "sp"
I was wondering if it was normal or common to have 0 deletion and indel. I find it weird to see only 0's. I'm I the only one seeing this in his data?
I used BWA for the alignment. I specified nothing in my pipeline related to deletion/indel removal. Maybe it's a default parameter one of the function I used in the pipeline...
What was your variant caller, and what SAM/BAM version is the BWA output? I observed the same phenomenon with BBMap aligner (output SAM v1.4 by default) plus FreeBayes (expected v1.3 input). Solved by adding the 'sam=1.3' flag to the aligner.
I have BWA 0.7.13-r1126 and I'm not sure on how to see the SAM/BAM version. If it is the samtools version it's Version: 1.3.1 (using htslib 1.3.1), VCFtools (0.1.15) and BCFtools Version: 1.3.1 (using htslib 1.3.1). Could it be a problem for version 1.3.1 or I should have 1.3?