Hi all,
This is a slightly strange issue and I'm sure that there's something simple I'm doing wrong, but if anyone could help me see what it is I'd be thrilled as this has been a very annoying issue for me!
Here is the situation. I have a big multi-sample VCF file with every variant found by sequencing. I also have a filtered version of this file which has low-coverage SNPs, all INDELs, and all variants that are only unique to the reference genome removed.
I would like to make a fasta files for each sample based off the filtered multi-sample VCF file. My first plan was to extract each individual from the filtered VCF file using VCFTools, then generate a fasta file vcf-consensus:
vcftools --vcf ../Main_NoInvariants.vcf --indv CL273_8_sorted.bam --positions positions.txt --recode --out CL273_8
bgzip -c CL273_8.recode.vcf > CL273_8.vcf.gz
tabix -p CL273_8.vcf.gz
cat reference.fa | bcftools consensus CL273_8.vcf.gz > CL273_8.fa
However, I soon realised that this could be done much more quickly in bcftools:
bcftools consensus -i -s CL273_8_sorted.bam -f reference.fa Main_NoInvariants.vcf.gz > CL237_8_1.fa
But... Neither of these worked. Both produced fasta files but they were identical to the reference (and I have confirmed that there are indeed differences between this sample and the reference in the filtered VCF file). No error message is produced in either case. I've also tried extracting individuals with vcftools using the --recode-INFO-all
, to preserve info, but it didn't make any difference.
What's weird though is that both methods DO work when I use the original, unfiltered VCF file. From a lot of fiddling about I've realised that the problem seems to be the lack of INDELs in the filtered VCF file. I know this because both methods will work on virtually any VCF file, then as soon as I remove the INDELs, it stops working.
I've tried two ways to remove INDELs; first just using VCFTools --remove-indels
and second using sed
to delete any line with "INDEL" in it. No good.
So I'm completely at a loss now. Any help much appreciated.
Thank you!
Hello rc16955 ,
CL273_8_sorted.bam
(with .bam)?fin swimmer