Hello everyone,
I've looked at vcftools
, bcftools
, bedops
, bedtools
and GATK SelectVariants
, but I don't see how I can address my requirement - I hope the community can help.
I want to subset a VCF file based on 4 attributes: CHROM
, POS
, REF
and ALT
. I have 2 VCF files, and I wish to exclude all entries from VCF1
where it matches those 4 values in VCF2
. Most of the above-mentioned tools work only on CHROM
and POS
, even if they accept a VCF file to filter by. There is no way they compare an exact match to ALT
allele, even if both VCF inputs were processed to split all multi-allelic sites.
The closest I can get to is by using bcftools annotate
, and by copying over an INFO
attribute (say, INFO/AC
) with a new name (say, INFO/DUMMY_AC
) so I can filter by that new name. The manual on bcftools annotate
states:
When REF and ALT are present, only matching VCF records will be annotated.
which works for me when my ALT
alleles are split, but does not help me filter, only mark them.
Is there any subset tool that will help me compare by custom attributes or do I have to write my own script for it?
Thank you!
--
Ram
I'd say your requirements are sufficiently complicated to go to a scripting approach, for which I would use cyvcf2 (python).
HI Ram,
Extract and write records from A shared by both A and B using exact allele match
This should work !!
Thanks
Najeeb
Thank you! It's been a while since I did this exercise so I am not sure if I checked out
bcftools isec
(I most probably did and either didn't see this example or excluded it for a reason). Would you happen to know whatexact allele match
means? I don't see its definition anywhere in the documentation. Also, I'd like to exclude those positions, not pick the intersect (but maybe I could have piped it to another command that does the actual exclusion)Just run bcftools isec on terminal and it gives exact "Extract and write records from A shared by both A and B using exact allele match" message.
How does this add to the conversation exactly? This is neither an explanation of the term "exact allele match" nor is it a pointer to excluding overlaps as opposed to picking them.