Hi Everybody,
I ran freebayes on a bam-file to create a gvcf-file. There was no error during the SNP-call itself but when I tried to index the resulting file with tabix it gave me an error stating that there were unsorted positions in the file. In fact for some reason the very last record for each contig was moved onto the following contig as if the whole CHROM column was shifted by one position.
Here is an example of what it looks like:
CM009931.2 27754011 . T <*> 0 . DP=1;END=27754012;MIN_DP=1
GQ:DP:MIN_DP:QR:RO:QA:AO -0:0:0:-0:0:-0:0
CM009931.2 27754013 . G <*> 0 . DP=1;END=27754014;MIN_DP=1
GQ:DP:MIN_DP:QR:RO:QA:AO -0:0:0:-0:0:-0:0
CM009932.2 27754015 . . <*> 0 . DP=0;END=27754200;MIN_DP=0
GQ:DP:MIN_DP:QR:RO:QA:AO -0:0:0:-0:0:-0:0
CM009932.2 37 . A <*> 0 . DP=1;END=38;MIN_DP=1 GQ:DP:MIN_DP:QR:
RO:QA:AO -0:0:0:-0:0:-0:0
One can see based on the position values that the 3rd entry here should still be on Chrom CM009931, but for some reason here it is said to be on chrom CM009932. I have no idea how this could happen and would much appreciate any help on this issue.
This is the code I used to perform the SNP-call:
cat $BAMLST | parallel --no-notice -k -j $NSLOTS "echo {} && freebayes \
--fasta-reference $REF \
--ploidy 2 \
--report-genotype-likelihood-max \
--use-mapping-quality \
--genotype-qualities \
--use-best-n-alleles 20 \
--haplotype-length 3 \
--min-mapping-quality 40 \
--min-base-quality 30 \
--min-alternate-fraction $MINALTFRAC \
--min-alternate-total $MINALTN \
--min-coverage $MINCOV \
--use-reference-allele \
--gvcf \
--gvcf-chunk 1 \
--bam {} |\
bgzip -f -@ $NSLOTS -c /dev/stdin > $OUTPUTFOLDER/vcf-master/raw/{/.}.gvcf.gz \
&& tabix -fp vcf $OUTPUTFOLDER/vcf-master/raw/{/.}.gvcf.gz"
I used GNU parallel to run over a list of bamfiles in parallel, but in this case I had just one input-file to test it.
I previously ran a SNP-call without the --gvcf
and the --gvcf-chunk
options on the same bam file without any problems, so the bamfile itself should be ok I guess.