Hi all,
I am trying to use bcftools consensus
(version 1.3.1) to get consensus sequences using a reference fasta file and a multisample vcf.
The problem is that when bases are missing (I can actually see that on my bam files), bcftools consensus
prints the reference allele.
I would like to have these as missing because I want to be able to see structural variations (some genes will actually be absent from my samples...).
An example of line I have in my vcf is:
LOC_Os08g14850_chr8_8938483..8952092_UTR-0 1 . G . 999 . . GT:DP ./.:0 ./.:1 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0 ./.:1 ./.:0 ./.:0 ./.:1 ./.:1 ./.:0 ./.:0 ./.:0 ./.:0 ./.:3 ./.:0 ./.:0 ./.:0 ./.:0 ./.:0
./.:1 ./.:0 ./.:2 ./.:0 ./.:1 ./.:0
The first position in the gene appears to not have been sequenced in any of my samples. However, the reference allele (a G in that case) is printed in my consensus fasta files...
Any help appreciated.
Many thanks,
Agathe
You might need to make an all-points vcf, one that has an entry for every single letter. The software is likely assuming that 'no news is good news' when it comes to loci with no vcf entry.