I have an input set of variants in VCF format (input.vcf.gz
), which includes indels and SNPs. Multi-allelic variants are split across multiple lines. Using bcftools view input.vcf.gz | vcf2bed | bedtools merge
I then created a bed file targets.bed
.
I would like to call only this predetermined list of variants using samtools
and/or bcftools
(or other tools). How would I go about doing this?
I suppose I could specify the targets.bed
file as a parameter to bcftools mpileup
:
bcftools mpileup -R targets.bed --fasta-ref human_g1k_v37_decoy.fasta input.bam
... but that would still not allow me to specifically call the given list of variants in input.vcf.gz
. Any suggestions what I could do here?
Thanks ! This does the trick. I'm adding the
-A
parameter tobcftools call
to include alts even if not present in the genotypes. Also, interestingly the-v
(verbose) flag breaksbcftools call
and I had to remove it to get anything to output. Not sure why. Unfortunately I ran into this issue: https://github.com/samtools/bcftools/issues/887 when trying to addAD
tobcftools mpileup
which luckily was fixed but requires a github developmental branch installation to fix.P.s. it would make things a bit easier if
bcftools call -T
allowed a VCF as input rather than the specified targets format it currently requires. I wonder why this is? It's not very intuitive and though I had read this part of the bcftools manual I somehow had discarded it as unlikely to work for indels (even though it does seem to now after testing).BTW, I don't clearly understand the difference between
bcftools mpileup -T file.bed
andbcftools mpileup -R file.bed
. For a small testcase,-R
seems to be significantly faster.Hello floris.barthel ,
it was my fault with the
-v
.-v
inbcftools call
is for printing variant sites only and this is not what we intend to do when using-C
. So this two option are in conflict.Also using
-R
for specifying the regions during mpileup is a better choice than-T
. If using-R
bcftools can directly jump to the position in the file. If using-T
it must iterate over the whole file to find the regions listed.I edit my answer now :)
fin swimmer